(12) INTERNATIONAL AMRaTION PUBLISHED UNDER THE PATENT COOTERATION TREATY (PCT) 




(19) World Intellectual Property Organization 

International Bureau 

(43) International Publication Date 
18 September 2003 (18.09,2003) 




PCT 



llllllliillllllllllllli 

(10) International Publication Number 

wo 03/076896 A2 



(51) International Patent Classification'^: GOIN 

(21) International Application Number: PCT/US03/06850 

(22) International Filing Date: 6 March 2003 (06.03.2003) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 
60/362,473 



6 March 2002 (06.03.2002) US 



(63) Related by continuation (CON) or continuation-in-part 
(CIP) to earlier application: 

US 60/362,473 (GIF) 

Filed on 6 March 2002 (06.03.2002) 

(71) Applicant (for all designated States except US): JOHNS 
HOPKINS UNIVERSITY [USAJS]; 720 Rutland Av- 
enue, Baltimore. MD 21205 (US). 

(72) Inventors; and 

(75) Inventors/Applicants ffor US onfy)yX:HAN, Daniel, 

[US/US]; 12925 Wexford Park, Clarksyille, MD 21029- 
1401 (US). CHAN, Daniel, W. [US/US]; 12925 Wexford 
Park, Clarksville, MD 21029-1401 (US)^ZHANG, Zhen 
[CNAJS]; 14104 Big Branch Drive, DAyton, MD 21036 



(US). LI, Jinong [USAJS]; 9653 Susies Way, Ellicott City, 
MD 21042 (US). 

(74) Agents: CORLESS, Peter, F. et al.; Edwards & Angell 
LLP, PC Box 9169, Boston, MA 02209 (US). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ. BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, H, GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT. LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NO, NZ, OM, PH, PL, FT. RO, RU, SC. SD, SE, 
SG, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, 
VC, VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS. MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ. BY, KG. KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG, CH. CY. CZ. DE, DK, EE. 
ES, FI, FR, GB, GR, HU, IE, IT, LU, MC, NL, PT, RO, 
SE, SI, SK, TR), OAPI patent (BF, BJ, CF. CG, CI, CM, 
GA, GN, GQ, GW, ML, MR, NE, SN, TD. TG), 

Published: 

— without international search report and to be republished 
upon receipt of that report 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations " appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



< 

OS 
QO 



(54) Title: USE OF BIOMARKERS TO DETECT BREAST CANCER 

(57) Abstract: The present invention relates to a method for identification of tmor biomarfcers markers for breast cancer, with high 
specificity and sensitivity- Preferred methods of the invention include qualifying breast cancer status in a subject comprising: (a) 
measuring at least one of the disclosed biomarkers in a sample fi-om the subject; and (b) correlating the measurement with breast 
cancer status. The invention further relates to kits for qualifying breast cancer status in a subject. 
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USE OF BIOMARKERS TO DETECT BREAST CANCER 

The present application cledms the benefit of U*S, provisional application 
number 60/362,473 filed March 6, 2002, and which is incorporated herein by 
reference in its entirety. 

5 FIELD OF THE INVENTION 

The invention provides for high specificity and sensitivity in the detection and 
identification of biomarkers, important for the diagnosis, prognosis and identification 
of tumor stage progression in breast cancer. The plasma protein profile in breast 
cancer patients are distinguished fi'om non-neoplastic individuals using biochip arrays 
10 and SELDI analysis. This technique provides a simple yet sensitive approach to 
diagnose breast cancer using plasma samples. 

BACKGROUND OF THE INVENTION 

Based on the National Cancer Institute (NCI) incidence and National Center 

15 for Health Statistics (NCHS) mortality data, the American Cancer Society estimated 
that breast cancer would be the most commonly diagnosed cancer among women in 
2002 in the United States. It is expected to account for 31 percent (203,500) of all 
new cancer cases among women and 39,600 will die fi'om this disease. Jemal A, 
Thomas A, Murray T, Thun M. Cancer statistics, 2002, CA Cancer J Clin. 

20 2002;52:23-47. Presymptomatic screening to detect early-stage cancer while it is still 
respectable with potential for cure can greatly reduce breast cancer related mortality. 
Unfortunately, only about 50% of the breast cancers are localized at the time of 
diagnosis. National Cancer Institute. Cancer Net PDQ Cancer Information 
Summaries. Monographs on "Screening for breast cancer." http://cancer 

25 net.nci.nih.gov/pdq.html (Updated January 2001). Despite the availability and 
recommended use of mammography for women age 40 and older as a routine 
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screening method, its effectiveness on reducing overall population mortality from 
breast cancer is still being investigated. K. Antman et al., JAMA. 1999;281:1470-2. 
Currently, serum tumor markers that have been investigated for use in breast cancer 
detection still lack the adequate sensitivity and specificity to be applicable m detecting 
5 early-stage carcinoma in a large population. The FDA approved tumor markers such 
as CA15.3 and CA27.29, are only recommended for monitoring therapy of advanced 
breast cancer or recurrence. D.W.Chanetal., JC//w. Oncology. 1997;15:2322-2328. 
New biomarkers that could be used individually or in combination with an existing 
modality for cost-effective screening of breast cancer are still urgently needed. 

10 

SUMMARY OF THE INVENTION 

The present invention provides, for the first time, novel protein markers that 
are differentially present in the samples of tumors at different clinical stages. The 
measurement of tiiese markers, alone or m combination, m patient samples provides 
15 mformation that diagnostician can correlate with a probable diagnosis of different 
clinical stages of human cancer. This is especially important when trying to identify 
biomarkers in pre-invasive tumors, whereby, such a diagnosis would be life savmg. 

Protein markers of the Invention can be characterized in one or more of 
several respects. 

20 In particular, in one aspect, markers of the invention are characterized by 

molecular weights under the conditions specified herein, particularly as determined by 
mass spectral analysis 

In another aspect, the markers can be characterized by features of the markcars' 
mass spectral signature such as size (including area) and/or shape of the markers' 
25 spectral peaks, features includmg proxunity, size and shape of neighboring peaks, etc. 
In yet another aspect, the markers can be characterized by afBnity binding 
characteristics, particularly ability to binding to an IMAC Ni adsorbent under 
specified conditions. 

In preferred embodunents, markers of the invention may be characterized by 
30 each of such aspects, i.e. molecular weighty mass spectral signature and IMAC Ni 
absorbent buiding. 
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Protein biomakers of the invention include the following designated herein as 
Markers I through XIV. Molecular weights as measured by mass spectrometry are 
also specified for each marker: 





MarKer i qjdCIj, 


naving a naoiecuiar wcigiii ui duwui t..? rj-/ 


5 


Marker U (J5L.Z): 


navmg a moiecuiar weigni ox ciduul o, i jsj-j 




Marker 111 


naving a moiecuiar weignt oi aouui o,y ssxj 




Marker IV: 


naving a moiecuiar weigni oi aooui ^.:> kxj 




Maiicer V: 


naving a moiecuiar weigni oi aooui kj-/ 




Marker VI: 


having a molecular weight of about 8.3 kD 


10 


Marker Vn: 


having a molecular weight of about 17 kD 




Maiker Vm: 


having a molecular weight of about 18 kD 




Marker DC: 


having a molecular weight of about 10.2 kD 




Marker X: 


having a molecular weight of about 6.0 kD 




Marker XI: 


having a molecular weight of about 8.4 kD 


15 


Marker Xn: 


having a molecular weight of about 7.5 kD 




MaricerXm: 


having a molecular weight of about 9.4 kD 




Marker XIV: 


having a molecular weight of about 16.3 kD 



Markers I through XIV also are characterized by their mass spectral signature. 
20 The mass spectra of each of Markers I through XIV are set fortfi in Figures 1 A 
tiirough IN respectively. 

Each of Maricers I through XIV also is characterized by its ability to bind to an 
IMAC Ni adsorbent after washing with phosphate buffered saline, as specified herein. 
In one aspect, the present invention provides a method of qualifying breast 
25 cancer status in a subject comprising 

(a) measuring at least one biomarker in a sample firom the subject, 
wherein the marker is selected &om Marker I (BCl); Marker n (BC2); 
Marker m (BC3); Marker IV; Marker V; Marker VI; Marker VII; Marker Vni; 
Marker IX; Marker X; Marker XI; Marker XII; Marker XDI; and Maiker XIV, and 
30 combinations thereof, and 
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(b) correlating the measurement with breast cancer status. In certam methods, 
the measuring step comprises detecting the presence or absence of markers in the 
sample. In other methods, the measuring step comprises quantifying the amount of 
marker(s) in tiie sample. In other methods, the measuring step comprises quaUfying 
5 die type of biomarker in the sample. 

In c^tain metiiods, the measuring step comprises detecting the presence or 
absence of markers in tiie sample. In other methods, tiie measuring step comprises 
quantifying the amount of maiker(s) in tiie sample. In otiier metiiods, tiie measuring 
step comprises qualifying the type of biomarker in the sample. 
10 The invention also relates to metiiods wherein the measuring step comprises: 

depositmg a subject sample of blood or a blood derivative on a surfece of a substrate 
comprising capture reagents tiiat bind tiie protein biomarkers. The subject sample 
may be optionally fractionated (e.g. on a pH gradient/) prior to such depositing and 
the collected and selected fractions deposited on tiie substrate. The blood derivative 
15 is, e.g., serum or plasma. In preferred embodiments, tiie substiiate is a SELDI probe 
comprising an IMAC Ni surfece and wherein the protein biomarkers are detected by 
SELDI. In other embodiments, the substrate is a SELDI probe comprising biospecific 
afSnify reagents that bind such Markers I tiirough XIV as identified above and 
wherein the protein biomarkers are detected by SELDI. In otiier embodiments, tiie 
20 substrate is a microtiter plate comprising biospecific afBnity reagents that bind one or 
more of Markers I tiirough XIV as identified above and the one or more protein 
biomarkers are detected by immunoassay. 

In catam embodiments, the metiiods fiuther comprise managing subject 
treabnent based on tiie status determined by tfie metiiod. For example, if tiie result of 
25 tiie metiiods of tiie {Hresrat invention is inconclusive or there is reason that 

confirmation of status is necessary, tiie physician may order more tests. Alternatively, 
if flie status indicates that surgery is appropriate, tiie physician may schedule tiie 
patient for surgery. Likewise, if tiie result of tiie test is positive, e.g., tiie status is late 
stage breast cancer or if the status is otherwise acute, no fiirttier action may be 
30 warranted. Furthermore, if tiie results show tiiat ti»atinent has been successfiil, no 
frirttier management may be necessary. 
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The invention also provides for such methods where the at least one 
biomarker is measured again after subject management. In these instances, the step of 
managing subject treatment is then repeated and/or altered depending on the result 
obtained. 

5 The term "breast cancer status" refers to the status of the disease in the patient 

Examples of types of breast cancer statuses include, but are not limited to, the 
subject's risk of cancer, the presence or absence of disease, the stage of disease in a 
patient, and the effectiveness of treatment of disease. Other statuses and degrees of 
each status are known in the art. 

10 Maikers of the invention can be resolved from other proteins in a sample by 

using a variety of fractionation techniques, e.g., chromatographic separation coupled 
with mass spectrometry, or by traditional immunoassays. In preferred embodiments, 
the method of resolution involves Sur&ce-Enhanced Laser Desorption/Ionization 
("SELDI") mass spectrometry, in which the surface of the mass spectrometry probe 

IS comprises adsorbents that bind the markers. 

In other preferred embodiments, comparative protein profiles are generated 
using tiie ProteinChip Biomarker System from patients diagnosed with breast cancer 
and from patients without known neoplastic diseases. A subset of biomarkers was 
selected based on collaborative results from supervised analytical methods. Preferred 

20 analytical methods include ProPeak (3Z Informatics, SC)., which implements the 

linear version of the Unified Maximum Separability Analysis (UMSA) algorithm, the 
Classification And Regression Tree (CART), hnplemented in Biomarker Pattern 
Software V4.0 (BPS) (Ciphei^en, CA). 

In a preferred embodiment, the analytical methods are used individually and in 

25 cross-comparison to screen for peaks that are most contributory towards the 
discrimination between breast cancer patients and die non-cancer controls. 

In another aspect, the biomarkers were purified and identified. The selected 
biomarkers, together with the tumor markers CA15.3 and CA27-29, were evaluated 
individually and in combination through multivariate logistic regression. 

30 While the absolute identity of these markers is not yet known, such knowledge 

is not necessary to measure them in a patient sample, because they are sufficiently 
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characterized by, e.g., mass and by afBnity characteristics. It is noted that molecular 
weight and binding properties are characteristic properties of these markers and not 
limitations on means of detection or isolation. Furthermore, using the methods 
described herein or other methods known in the art, the absolute identity of the 
5 markers can be determined. 

Preferred methods for detection and diagnosis of cancer comprise detecting at 
least one or more protein biomaikers in a subject sample, and; correlating the 
detection of one or more protein biomarkers with a diagnosis of cancer, wherein the 
correlation takes into account the detection of one or more biomarker in each 
10 diagnosis, as compared to normal subjects, wherein the one or more protein markers 
are selected fiom Marker I (BCl); Marker H (BC2); Marker HI (BC3); Marker IV; 
Marker V; Marker VH; Marker VHI; Maricer IX; Marker X; Marker XI; Marker XH; 
Marker XIU; and Marker XIV, and combmations thereof. 

In a preferred method for detection, diagnosis and determination of tiie clinical 
15 stage of breast cancer, comprises detecting at least one or more protein biomarkers in 
a subject sample, wherem tiie protein markers are selected from Marker I (BCl); 
Marker n (BC2); Marker HI CBC3), combinations thereof; 

and; correlating the detection of one or more protein biomarkers with a 
diagnosis of breast cancer, wh^ein the correlation takes into account the detection of 
20 one or more protein biomarkers in each diagnosis, as compared to normal subjects. 

In a preferred method for detection, diagnosis and determination of the earliest 
clinical stages of breast cancer, comprises detecting at least one or more protein 
biomarkers in a subject sample, wherem tiie protem markers are selected from Marker 
I ^Cl); Marker H (BC2); Marker m (BC3), and combinations fliereofi 
25 and; correlating the detection of one or more protein biomaricers with a 

diagnosis of breast cancer, wherein the correlation takes into account die detection of 
one or more protein biomarkers m each diagnosis, as compared to normal subjects. 
Preferably, the markers are detected at Stage 0, vviiich is the earliest stage of breast 
cancer. Results showing tiie saisitivity and specificity of detecting the markers in the 
30 early stages are described in the Examples which follow. 
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In other preferred embodiments, a plurality of the biomarkers are detected, 
preferably at least one of the biomarkers is detected, more preferably at least two of 
the biomarkers are detected, most preferably at least three of the biomarkers are 
detected. The most preferred markers are: 
5 Marker I ^Cl): having a molecular weight of about 4.3 kD 

Marker n(BC2): havmg a molecular weight of about 8.1 kD 
Marker m (BC3): having a molecular weight of about 8.9 kD 
In a prefeired metfiod for diagnosing and differentiating between the different 
malignant stages of cancer, the method ccnnprises using a biochip array to generate a 
10 firet set of data representative of the first set of biological markers; and evaluating the 
first set of data detecting at least one or more protein biomarkers in a subject sample, 
and; correlating tiie detection of one or more protein biomarkers with a progressive 
malignant stage of cancer as compared to normal subjects. 

In one aspect of the invention the metiiod comprises detecting one or more 
15 protein biomarkers are used in diagnosing and differentiating between the different 
malignant stages of cancer, wherein, the one or more protein markers are selected 
fiom Marker I (BCl); Marker H (BC2); Marker m (BC3); Marker IV; Marker V; 
Marker VH; Marker VHI; Marker IX; Marker X; Marker XI; Marker XH; Maiker 
Xni; and Marker XIV, and combinations thereof. 
20 In another aspect, the present mvention provides for a metiiod for diagnosing 

and differentiating between the different maUgnant stages of breast cancer, wherein 
the method comprises: 

detecting at least one or more protein biomarkers in a subject sample, 
and; correlating the detection of one or more protein biomarkers witii a 
25 diagnosis of breast cancer, wherein the correlation takes into account 

the detection of one or more protein biomarkers in each diagnosis, as 
compared to normal subjects. 
In another aspect of the mvention, a smgle biomarter is used to differentiate 
between tiie different maUgnant stages of cancer. Also provided is a single biomarker 
30 to differentiate between tiie different malignant stages of cancer in combination witii 
one or more known cancer biomarkers for diagnosmg cancer such as, for example, tiie 
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breast cancer markers CA 15.3 and CA 27.29. It is preferred that one or more protein 
biomaikers are used in comparing protein profiles from patients susceptible to, or 
suffering from cancer, such as breast cancer, with normal subjects. 

In anotfier aspect of the invention, the patient sample is selected from the 
5 group consisting of blood, blood plasma, serum, urine, tissue, cells, organs and 
vaginal fluids. 

Piefeired detection metfiods include use of a biochip array. Biochip arrays 
usefiil in the invention include protein and nucleic acid arrays. One or more marlcere 
are immobUized on the biochip array and subjected to laser ionization to detect the 
10 molecular weight of the markers. Analysis of tiie markers is, for example, by 
molecular weight of the one or more markers agamst a threshold intensity that is 
noimalized against total ion current. Preferably, logarittunic transformation is used 
for reducing peak intensity ranges to limit tiie number of maikrars detected. 

In another preferred method, data is generated on umnobilized subject samples 
15 on a biochip array, by subjecting said biochip array to laser ionization and detecting 
intensity of signal for mass/chaige ratio; and, transforming the data into computer 
readable form; and executing an algorithm that classifies the data according to user 
input parameters, for detecting signals that represent markers present in breast cancer 
patients and are lacking m non-cancer subject controls. 
20 Preferably the biochip surfeces are, for example, ionic, anionic, comprised of 

unmobilized nickel ions comprised of a mixture of positive and negative ions, 
comprises one or more antibodies, single or double stianded nucleic acids, comprises 
I»oteins, peptides or fragments thereof amino acid probes, comprises phage display 
libraries. 

25 In otiier prefered methods one or more of the maricers are detected using laser 

desorption/ionization mass spectrometry, comprising, providing a probe adapted for 
use with a mass spectrometer comprising an adsorbent attached thereto, and; 
contacting the subject sample witii the adsorbent, and; desorbing and ionizing the 
marker or markers fitjm the probe and detecting the deionizedAonized maricers wifli 

30 the mass spectrometer. 
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Preferably, the laser desorption/ionization mass spectrometry comprises, 
providing a substrate comprising an adsorbent attached thereto; contacting the subject 
sample with the adsorbent; placing the substrate on a probe adapted for use with a 
mass spectrometer comprismg an adsorbent attached thereto; and, desorbing and 
5 ionizing the marker or markers from the probe and detecting the desorbed/ionized 
marker or markers with the mass spectrometer. 

In another embodiment, various compositions are provided to further aid in the 
diagnosis of breast cancer: 

A composition comprising Marker I and one more biomarkers selected from 
10 Markers n through XIV. 

A composition comprising Marker II and one more biomarkers selected from 
Markers I, HI , through XIV. 

A composition comprising Marker ni and at least one more biomarkers 
selected from Markers I, II, IV through XIV. 
15 A composition comprising Marker IV and at least one more biomarkers 

selected from Markers I, II, HI, V through XIV. 

A composition comprising Marker V and at least one more biomarkers 
selected from Markers I, II, m, IV, VI through XIV. 

A composition comprising Marker VI and one more biomarkers selected from 
20 Markers I, H, III, IV, V through XTV. 

A composition comprising Marker VII and one more biomarkers selected from 
Markers I, H, m, IV, V, VI, VBDE through XIV. 

A composition comprising Marker Vm and one more biomarkers selected 
from Markers I, H, m, IV, V, VI, VH through XIV. 
25 A composition comprising Marker DC and one more biomarkers selected from 

Markers I, H, HI, IV, V, VI, VH, VIE, X through XIV. 

A composition comprising Marker X and one more biomarkers selected from 
Markers I through DC, XI through XIV. 

A composition comprising Marker XI and one more biomarkers selected from 
30 Markers I through X, XH through XIV. 
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A composition comprising Marker XH and one more biomarkers selected from 
Markers I through XI, Xm through XIV. 

A composition comprising Marker Xm and one more biomarkers selected 
from Markers I through Xn, XIV. 
5 A composition comprising Marker XIV and one more biomarkers selected 

from Markers I tiirough XXH. Preferably, in these compositions, the markers are 
substantially pure and/or isolated e.g. from a serum sample. 

For Ae mass values of the markers disclosed herein, the mass accuracy of the 
spectral instrument is considered to be about within +/- 0.15 percent of the disclosed 
10 molecular weight value. Additionally, to such recognized accuracy variations of the^ 
instrument, the spectral mass determination can vary within resolution limits of from 
about 400 to 1000 m/dm, where m is mass and dm is the mass spectral peak width at 
0.5 peak height. Tliose mass accuracy and resolution variances associated with the 
mass spectral instrument and operation thereof are reflected in the use of tiie term 
15 "about" in the (Msclosure of the mass of each of Markers I through XTV. It is also 
intended that such mass accuracy and resolution variances and thus meaning of the 
term "about" witii respect to tiie mass of each of markers I through XIV is inclusive of 
variants of tiie markers as may exist due to sex and/or etimicity of the subject and tiie 
particular cancer or origin or stage thereof. 
20 In tiie discovery of the biomarkers for breast cancer usmg the metiiods of fliis 

invention, tiie preferred analysis of flie date is by logaritiimic transformation for 
reducing peak intensity ranges to limit tiie number of markers detected and data 

obtained fiom tiie peak intensities of each sample are projected as individual points 
onto a three-dimensional component space. The component spaces are linear 

25 combinations of the peak intensities. Each component space corresponds to directions 
along which about two pre-specified groups of data achieve maximum separation as 
determined by an interactive three dimensional ^splay on a computer monitor. A 
significance score for each individual mass peak is computed and each individual 
mass peak is ranked according to their collective contribution towards tiie maximal 

30 separation of two pre-specified groiqis of data. A significance score may be positive 
or negative values, wherein a positive score correlates with an increased e3q)ression of 
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the corresponding mass peak obtained from a patient sample with cancer and a 
negative score correlates with a decreased expression of a corresponding mass peak 
obtained from a cancer patient sample. 

The immobilized molecules are subjected to laser ionization to detect mass 
5 peaks of each biomarker and the mass peaks of the one or more markers are analyzed 
against a threshold intensity that is normalized against total ion current. Preferred 
selected mass peaks are between about 2 K to about 150 K. 

In a preferred embodunent, a fixed percentage of biomarker samples are 
randomly excluded during the analysis of mass peaks, wherein a median and mean 
10 rank is determined for each peak. The analysis is run at least about 100 times. 

In another preferred embodiment, a method for predictmg mass peaks that are 
representative of a biomarker for detecting and differentiating between the 
progressive stages of cancer, is provided for, said method comprising: 

obtaining samples from normal subjects and subjects suffering from cancer, 

IS and; 

providmg a biochip array for evaluating the mass peaks of sdd samples, 
wherein; 

said biochip array comprising a chemically modified metal affinity surface 
having stably attached thereto a plurality of molecules capable of selective binding to 
20 at least one member of the group consisting of proteins, peptides or fragments thereof 
and; 

using the samples from a normal subject and cancer patient subjects to obtam 
a first set of data representative of the first set of biomarkers, and; 
evaluating the first set of data detecting at least one or more protein 
25 biomarkers in a subject sample, and; 

correlating the detection of one or more protein biomarkers with a progressive 
malignant stage of cancer, such as breast cancer as compared to normal 
subjects. 

In one aspect a set of control data is generated, comprising: 
30 generating a control set of biomarkers representative of a normal 

subject; 
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using fhe biochip array to generate a control set of data representative 
of the control set of biomarkers; and, 

comparing the &st and control sets of data to predict which mass 
peaks are representative of a biomarker that differentiates between the 
S different malignant stages of cancer progression. 

In another aspect the method further comprising: 

generating a second set of biomarkers representative of a patient 
suffering from cancer, and; 

using the biochip array to generate a second set of data representative 
10 of a first stage of cancer; and, 

comparing the second and control sets of data to predict which mass 
peaks are representative of a biomarker that differentiates between a first and 
second stage of cancer progression. 

The method is repeated at least one or more times until a set of data is 
IS obtained which is used to predict the mass peaks of any potential biomarker 
representative of a certain stage of a malignant cancer. 

In another preferred embodiment a biomarker database is constructed which 
contains a plurality of data sets representative of fhe different stages of breast cancer; 
and by comparing the test set of data with the database to predict the mass peaks 
20 which are potential biomarkers for detecting at least one stage of breast cancer. 

The database is used to predict mass peaks which are potential biomarkers for 
detecting any stage of breast cancer and for mining of data from the database for 
evaluation and prediction of potential biomarkers which differentiate between 
different stages of different cancers. 
25 In another preferred embodiment, tissue samples from cancer patients are 

analyzed and compared to normal subjects by obtaining data sets of mass peaks 
which are used to determine potential biomarkers which are present in pre-invasive 
tumors such as a breast tumor. 

Other aspects of the invention are described infra. 

30 
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BRIEF DESCRIPTION OF THE FIGURES 

Figures lA through IN show mass spectra of Markers I through XIV 
respectively. In those Figures, the mass spectral peak of the specified marker is 
designated withm the depicted spectra with an arrow. The Figure designation is set 
5 above each of the referred to spectra. 

Figure 2 shows a representative mass peak spectrum obtained by SELDI 
analysis of serum protems retained on an IMAC-Ni^"^ chip. The upper panel shows 
the spectrum view; the lower panel shows the pseudo-gel view of the same spectrum 
of M/Z (mass-dependent velocities) between 4,000 and 10,000. 
10 Figure 3 shows the results of logarithmic transformation on data variance 

reduction and equalization. 

Figures 4A-4B show a 3 dimensional-UMSA-component plot of stages O-I 
breast cancer (darker squares) versus non-cancer (white squares). 

Figure 4A shows illustrative results of separation achieved using UMSA 
1 5 derived liner combination of all 147 peaks. 

Figure 4B shows illustrative results of separation achieved using UMSA 
derived liner combination using the three selected peaks. 

Figure 5 is a graph showing fifteen peaks with top mean ranks and minimal 
rank standard deviations derived from ProPeak Bootstrap Analysis. Horizontal line at 
20 7.0 was tiie minimum rank standard deviation computed by applying the same 
procedure to a randomly generated data set tiiat simulated the distribution of the 
original data. 

Figures 6A-6B are graphs showing a plot of absolute values of the relative 
significance scores of selected peaks based on contribution towards the separation 
25 between stages O-I breast cancer and the non-cancer controls. 

Figure 6A shows the results of 15 peaks selected fix>m ProPeak Bootstrap 
Analysis with rank standard deviation < 7.0. 

Figure 6B is a graph showmg re-evaluated scores of the selected top 4 peaks 

from figure 5A. 

30 Figure 7 is a graph showing receiver-operating-characteristic (ROC) curve 

analysis of BCl, BC2, BC3, and logistic regression derived composite mdex. p- 
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values from AUG (Area-under-curve) comparison between each individual 
biomarkers and the Composite Index are listed in the figure. 

Figure 8A-8B are scatter plots showing the distribution of the selected 
biomarker(s) across aU diagnostic groups including clinical stages of the cancer 
5 patients. 

Figure 8A is a scatter plot showing the results obtained with BC3 alone. 
Figure 8B is a scatter plot showing the results of a logistic regression derived 
composite index using BCl, BC2 and BC3. 

Figure 9 shows a panel of three 2 dimensional scatter plots depicting 
10 distributions ofall patient samples. 

DEFINITIONS 

Unless defined othervwse, all technical and scientific terms used herem have 
tiie meaning commonly understood by a person skUled in tiie art to which this 
15 invention belongs. The following references provide one of skiU with a general 

definition of many of the terms used m this mvention: Singleton et al.. Dictionary of 
Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of 
Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. 
Rieger et al. (eds.). Springer Veriag (1991); and Hale & Mariiam, The Harper CoUms 
20 Dictionary of Biology (1991). As used herem, the foUowmg terms have the meanings 
asonbed to Ihem unless specified otherwise. 

"Gas phase ion spectrometer" refers to an iqpparatus that detects gas phase 
ions. Gas phase ion spectrometers include an ion source that supplies gas phase ions. 
Gas phase ion spectrometers inchide, for example, mass spectrometers, ion mobiKty 
25 spectrometers, and total ion current measurmg devices. "Gas phase ion spectrometry" 
refers to the use of a gas phase ion spectrometer to detect gas phase ions. 

"Mass spectrometer" refers to a gas phase ion spectrometer that measures a 
parameter tiiat can be translated into mass-to-charge ratios of gas phase ions. Mass 
spectrometers generally include an ion source and a mass analyzer. Examples of mass 
30 spectrometers are time-of-flight, magnetic sector, quadrupole filter, ion trap, ion 
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cyclotron resonance, electrostatic sector analyzer and hybrids of these. "Mass 
spectrometry" refers to the use of a mass spectrometer to detect gas phase ions. 

"Laser desorption mass spectrometer^' refers to a mass spectrometer that uses 
laser energy as a means to desorb, volatilize, and ionize an analyte. 
5 "Tandem mass spectrometer" refers to any mass spectrometer that is capable 

of performing two successive stages of m/z-based discrhnmation or measurement of 
ions, including ions in an ion mixture. The phrase mcludes mass spectrometers 
havmg two mass analyzers that are capable of performing two successive stages of 
m/z-based discrimination or measurement of ions tandem-in-space. The phrase 
10 further includes mass spectrometers having a smgle mass analyzer that is capable of 
performing two successive stages of m/z-based discrimination or measurement of ions 
tandem-in-time. The phrase thus explicitly includes Qq-TOF mass spectrometers, ion 
trap mass spectrometers, ion trap-TOF mass spectrometers, TOF-TOF mass 
spectrometers, Fourier transform ion cyclota-on resonance mass spectrometers, 
1 5 electrostatic sector - magnetic sector mass spectrometers, and combinations thereof. 
"Mass analyzer" refers to a sub-assembly of a mass spectrometer that 
comprises means for measurmg a parameter that can be translated mto mass-to-charge 
ratios of gas phase ions. In a time-of-flight mass spectrometer the mass analyzer 
comprises an ion optic assembly, a flight tube and an ion detector. 
20 "Ion source" refers to a sub-assembly of a gas phase ion spectrometer that 

provides gas phase ions. In one embodiment, the ion source provides ions through a 
desorption/ionization process. Such embodiments generally comprise a probe 
mterfece that positionally engages a probe in an mterrogatable relationship to a source 
of ionizing energy (e.g., a laser desorptionAonization source) and in concurrent 
25 communication at atmospheric or subatmospheric pressure with a detector of a gas 
phase ion spectrometer. 

Forms of ionizmg energy for desorbing^onizing an analyte from a solid phase 
include, for example: (1) laser energy; (2) fast atoms (used in fest atom 
bombardment); (3) high energy particles generated via beta decay of radionuclides 
30 (used in plasma desorption); and (4) primary ions generating secondary ions (used in 
secondary ion mass spectrometry). The preferred form of ionizing energy for solid 
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phase analytes is a laser (used in laser desorption/ionization), in particular, nitrogen 
lasers, Nd-Yag lasers and other pulsed laser sources. "Fluence" refers to the energy 
deUvered per unit area of mterrogated image. A high fluence source, such as a laser, 
will deliver about 1 mJ / mm2 to 50 mJ / mm2. Typically, a sample is placed on the 
5 surfece of a probe, the probe is engaged with the probe interface and the probe surfece 
is struck with the ionizing energy. The energy desorbs analyte molecules from the 
sur&ce into Ifae gas phase and ionizes ttiem. 

Other forms of ionizing energy for analytes mclude, for example: (1) 
electrons that ionize gas phase neutrals; (2) strong electric field to induce ionization 
10 from gas phase, solid phase, or Uquid phase neutrals; and (3) a source that applies a 
combination of ionization particles or electric fields with neutral chemicals to induce 
chemical ionization of solid phase, gas phase, and liquid phase neutrals. 

**Probe" in the context of this invention refers to a device ad^ted to engage a 
probe interfece of a gas phase ion spectrometer (e.g., a mass spectrometer) and to 
15 present an analyte to ionizing energy for ionization and introduction into a gas phase 
ion spectrometer, such as a mass spectrometer. A "probe" will generally comprise a 
solid substrate (either flexible or rigid) comprising a sample presenting surfece on 
which an analyte is presented to the source of ionizing energy. 

"Surfece-enhanced laser desorption/ionization" or "SELDT' refers to a metiiod 
20 of desorption/ionization gas phase ion spectrometry (e.g., mass spectrometry) in 

which the analyte is captured on tiie surfece of a SELDI probe tiiat engages the probe 
interfece of the gas phase ion spectrometer. In "SELDI MS," the gas phase ion 
spectrometOT is a mass spectrometer. SELDI technology is described in, e.g., U.S. 
patent 5,719,060 (Hutohens and Yip) and U.S. patent 6,225,047 (Hutchens and Yip). 
25 "Surfece-Enhanced Affinity Capture" or "SEAC" is a version of SELDI that 

involves the use of probes comprising an absorbent surfex^e (a "SEAC probe"). 
"Adsorbent surfece" refers to a surface to which is bound an adsorbent (also called a 
"capture reagent" or an "affinity leagenf *)• An adsorbent is any material capable of 
binding an analyte (e.g., a target polypeptide or nucleic acid). "Chromatographic 
30 adsorbent" refers to a material typically used in chromatography. Chromatographic 
adsorbents include, for example, ion exchange materials, metal chelators (e.g.. 
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nitriloacetic acid or iminodiacetic acid), immobilized metal chelates, hydrophobic 
interaction adsorbents, hydrophilic interaction adsorbents, dyes, simple biomolecules 
(e.g., nucleotides, amino acids, simple sugars and fatty acids) and mixed mode 
adsorbents (e.g., hydrophobic attraction/electrostatic repulsion adsorbents). 

5 '^iospecific adsorbent" refers an adsorbent comprising a biomolecule, e.g., a nucleic 
acid molecule (e.g., an aptamer), a polypeptide, a polysaccharide, a lipid, a steroid or 
a conjugate of these (e.g., a glycoprotein, a lipoprotein, a glycolipid, a nucleic acid 
(e.g., DNA)-protein conjugate). In certain instances the biospecific adsorbent can be 
a macromolecular structure such as a multiprotein complex, a biological membrane or 

10 a virus. Examples of biospecific adsoibents are antibodies, receptor proteins and 
nucleic acids. Biospecific adsorbents typically have higher specificity for a target 
analyte than chromatographic adsoibents. Further examples of adsorbents for use in 
SELDI can be found in U.S. Patent 6,225,047 (Hutchens and Yip, "Use of retentate 
chromatography to generate difference maps," May 1, 2001). 

15 In some embodiments, a SEAC probe is provided as a pre-activated surfece 

which can be modified to provide an adsorbent of choice. For example, certain 
probes are provided with a reactive moiety Aat is capable of binding a biological 
molecule through a covalent bond. Epoxide and carbodiimidizole are usefol reactive 
moieties to covalenify bind biospecific adsorbents such as antibodies or ceUular 

20 recqptors. 

"Adsorption" refers to detectable nonrcovalent binding of an analyte to an 
adsorbent or capture reagent. 

"Surface-Enhanced Neat Desorption" or "SEND" is a version of SELDI tiiat 
involves the use of probes comprising energy absorbmg molecules chemically bound 
25 to the probe surfece. C'SEND probe.") "Energy absorbing molecules" ("EAM") refer 
to molecules that are capable of absorbing energy fiwm a laser desorption/ ionization 
source and thereafter contributing to desorption and ionization of analyte molecules in 
contact therewitii. The phrase mdudes molecules used m MALDI , firequentiy 
referred to as "matrix", and explicitly includes cinnamic acid derivatives, smapinic 
30 acid ("SPA"), cyano-hydroxy-cinnamic acid C'CHCA") and dihydroxybenzoic acid, 
ferulic acid, hydrojQracetophenone derivatives, as well as others. It also includes 
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EAMs used in SELDI. SEND is further described in United States patent 5,719,060 
and United States patent appUcation 60/408,255, filed September 4, 2002 (Kitagawa, 
"Monomers And Polymers Having Energy Absorbing Moieties Of Use In 
Desorption/Ionization Of Analytes"). 
5 "Surfece-Enhanced Photolabile Attachment and Release" or "SEPAR" is a 

version of SELDI that involves the use of probes having moieties attached to the 
surfece that can covalently bind an analyte, and then release the analyte through 
breaking a photolabile bond in the moiety after ejqjosure to light, e.g., laser light. 
SEPAR is further described in United States patent 5,719,060. 
10 "Eluant" or "wash solution" refers to an agent, typically a solution, which is 

used to affect or modify adsorption of an analyte to an adsorbent surface and/or 
remove unbound materials fiom the surfece. The elution characteristics of an eluant > 
can depend, for example, on pH, ionic strength, hydrophobicity, degree of 
chaotropism, detergent strength and temperature. 
15 "Analyte" refers to any component of a sample that is desired to be detected. 

The term can refer to a single component or a plurality of components in the sample. 

The "complexity" of a sample adsorbed to an adsorption surface of an afOnify 
capture probe means the number of different protein species that are adsorbed. 

"Molecular binding partners" and "specific binding partners" refer to pairs of 
20 molecules, typically pairs of biomolecules that exhibit specific binding. Molecular 
binding partners include, without limitation, receptor and ligand, antibody and 
antigen, biotin and avidin, and biotin and streptavidm. 

"Monitoring" refers to recording changes in a contmuously varying parameter. 
"Biochip" refers to a solid substrate having a generally planar surfece to which 
25 an adsorbent is attached. Frequently, the surfece of the biochip comprises a plurality 
of addressable locations, each of which location has the adsorbent bound there. 
Biochips can be adapted to engage a probe interfece and, therefore, fiinction as 
probes. 

"Protein biochip" refers to a biochip adapted for the capture of polypeptides. 
30 Many protein biochips are described in the art. These include, for example, protein 
biochips produced by Ciphergen Biosystems (Fremont, CA), Packard Bioscience 
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Company (Meriden CT), Zyomyx (Hayward, CA) and Phylos (Lexington, MA). 
Examples of such protein biochips are described in the following patents or patent 
applications: U.S. patent 6,225,047 (Hutchens and Yip, "Use of retentate 
chromatography to generate difference maps," May 1, 2001); International 
5 publication WO 99/5 1 773 (Kuimelis and Wagner, "Addressable protein arrays," 
October 14, 1999); U.S. patent 6,329,209 (Wagner et al., "Arrays of protein-capture 
agents and methods of use thereof," December 1 1, 2001) and International publication 
WO 00/56934 (Englert et al., "Continuous porous matrix arrays," September 28, 
2000). 

10 Protein biochips produced by Ciphergen Biosystems comprise surfaces having 

chromatographic or biospecific adsorbents attached thereto at addressable locations. 
Ciphergen ProtemChip® arrays include NP20, H4, H50, SAX-2, WCX-2, CM-10, 
IMAC-3, IMAC-30, LSAX-30. LWCX-30, IMAC-40, PS-10, PS-20 and PG-20. 
These protein biochips comprise an alummum substrate in the form of a strip. The 
15 sur&ce of the strip is coated with silicon dioxide. 

In the case of tiie NP-20 biochip, silicon oxide fimctions as a hydrophilic adsorbent to 
capture hydrophilic proteins. 

H4, H50. SAX-2, WCX-2, CM-10, IMAC-3, IMAC-30, PS-10 and PS-20 
biochips further comprise a fimctionalized, cross-linked polymer in the form of a 
20 hydrogel physically attached to the surfece of the biochip or covalentty attached 
tiirough a silane to the sur&ce of fte biochip. The H4 biochip has isopropyl 
functionalities for hydrophobic binding. The H50 biochip has nonylphenoxy- 
poly(ethylene glycol)mettiacrylate for hydrophobic binding. The SAX-2 biochip has 
quaternary ammonium functionalities for anion exchange. The WCX-2 and CM-10 
25 biochips have carboxylate functionalities for cation exchange. The IMAC-3 and 

IMAC-30 biochips have nitriloacetic acid functionalities that adsorb transition metal 
ions, such as Cu** and Nf*, by chelation. These immobilized metal ions allow 
adsorption of peptide and proteins by coordinate bonding. The PS-10 biochip has 
carfooimidizole functional groups that can react wiA groups on protems for covalent 
30 binding. The PS-20 biochip has epoxide functional groups for covalent binding with 
proteins. The PS-series biochips are usefiil for binding biospecific adsorbents, such as 
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antibodies, receptors, lectins, heparin, Protein A, biotin/streptavidin and the like, to 
chip surfaces where they fimction to specifically capture analytes from a sample. The 
PG-20 biochip is a PS-20 chip to which Protein G is attached. The LSAX-30 (anion 
exchange), LWCX-30 (cation exchange) and IMAC-40 (metal chelate) biochips have 
5 functionalized latex beads on their surfeces. Such biochips are fiirther described in: 
WO 00/66265 (Rich et al., "Probes for a Gas Phase Ion Spectrometer," November 9, 
2000); WO 00/67293 (Beecher et al., "Sample Holder with Hydrophobic Coatmg for 
Gas Phase Mass Spectrometer," November 9, 2000); U.S. patent application 
US20030032043A1 (Pohl and Papanu, '*Latex Based Adsorbent Chip," July 16, 2002) 
10 and U.S. patent application 60/350.1 10 (Um et al., "Hydrophobic Surfece Chip," 
November 8, 2001). 

Upon capture on a reaction medium or substrate such as a biochip, analytes 
can be detected by a variety of detection metiiods selected from, for example, a gas 
phase ion spectrometry method, an optical method, an electrochemical method, 
15 atomic force microscopy and a radio frequency metiiod. Gas phase ion spectrometry 
methods are described herein. Of particular interest is the use of mass spectrometry 
and, in particular, SELDI. Optical methods include, for example, detection of 
fluorescence, luminescence, chemiluminescence, absorbance, reflectance, 
transmittance, birefiingence or refractive index (e.g., surface plasmon resonance, 
20 eUipsometiy, a resonant mirror method, a grating coupler waveguide metiiod or 
interferometry). Optical methods include microscopy (both confocal and non- 
confocal), imaging metiiods and non-imaging methods, faununoassays in various 
formats (e.g., ELISA) are popular metiiods for detection of analytes captured on a 
soUd phase. Electrochemical metiiods include voltametry and ampeiometiy metiiods. 
25 Radio frequency methods include multipolar resonance spectroscopy. 

"Marker^ in the context of the present invention refers to a polypeptide (of a 
particular apparent molecular weight) which is differentiaUy present in a sample taken 
from patients having human cancer as compared to a comparable sample taken from 
control subjects (e.g., a person with a negative diagnosis or undetectable cancer, 
30 normal or healthy subject). 
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The phrase "differentially presenf refers to differences in the quantity and/or 
the fiequency of a marlcer present in a sample taken from patients having human 
cancer as compared to a control subject. For examples, a marker can be a polypeptide 
which is present at an elevated level or at a decreased level in samples of human 
5 cancer patients compared to samples of control subjects. Alternatively, a marker can 
be a polypeptide which is detected at a higher frequency or at a lower frequency in 
samples of human cancer patients compared to samples of control subjects. A marker 
can be differentially present in terms of quantity, frequency or both. 

A polypeptide is diflferentially present between tiie two samples if the amount 
10 of the polypeptide in one sample is statistically significantly different from the 
amount of tiie polypeptide m the otiier sample. For example, a polypeptide is 
differentially present betweai tiie two samples if it is present at least about 120%, at 
least about 130%, at least about 150%, at least about 180%, at least about 200%, at 
least about 300%, at least about 500%, at least about 700%, at least about 900%, or at 
15 least about 1000% greater than it is present m the other sample, or if it is detectable in 
one sample and not detectable in Ifae other. 

Alternatively or additionally, a polypeptide is differential^ present between 
flie two sets of samples if the frequency of detecting the polypeptide in the human 
cancer patients' samples is statistically significantiy higher or lower than in the 
20 control samples. For example, a polypeptide is differentially present between the two 
sets of samples if it is detected at least about 120%, at least about 130%, at least about 
150%, at least about 180%, at least about 200%, at least about 300%, at least about 
500%, at least about 700%, at least about 900%, or at least about 1000% more 
frequently or less fiequentiy observed in one set of samples tiian tiie other set of 
25 samples. 

••Diagnostic" means identifymg the presence or nature of a pathologic 
condition. Diagnostic methods differ m fliefr sensitivity and specificity. The 
"sensitivity" of a dis^ostic assay is tiie percentage of diseased individuals who test 
positive (percent of "true positives"). Diseased individuals not detected by the assay 
30 are "felse negatives." Subjects who are not diseased and who test negative in the 
assay, are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus 
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the false positive rate, where the "false positive" rate is defined as the proportion of 
those without the disease who test positive. While a particular diagnostic method may 
not provide a definitive diagnosis of a condition, it suffices if the method provides a 
positive indication that aids in diagnosis. 
5 A *test amount" of a marker refers to an amount of a marker present in a 

sample being tested. A test amount can be either in absolute amount (eg., ng/ml) or a 
relative amount (e.g. , relative intensity of signals). 

A "diagnostic amounf of a maiker refers to an amoimt of a maiker in a 
subject's sample that is consistent vwth a diagnosis of human cancer. A diagnostic 
10 amount can be either in absolute amount (e.g., \ig/ral) or a relative amount (eg., 
relative intensity of signals). 

A "control amounf ' of a maiker can be any amount or a range of amount 
which is to be compared agamst a test amount of a marker. For example, a control 
amount of a marker can be the amount of a maricer in a person without human cancer. 
15 A control amount can be eitiier in absolute amount (e.g., |Lig/ml) or a relative amount 
(e.g., relative intensity of signals). . 

"Antibody" refers to a polypeptide ligand substantially encoded by an 
immunoglobulin gene or immunoglobulin genes, or fragments thereof, which 
specifically binds and recognizes an epitope (e.g., an antigen). The recognized 
20 immunoglobuim genes include tiie kappa and lambda Ught chain constant region 
genes, tiie alpha, gamma, delta, epsilon and mu heavy cham constant region genes, 
and tfie myriad immunoglobulin variable region genes. Antibodies exist, e.g., as 
intact immunoglobulins or as a number of well characterized fragments produced by 
digestion with various peptidases. This includes, e.g.. Fab* and F(ab)'2 fragments. 
25 The term "antibody," as used herein, also includes antibody fragments either 

produced by tiie modification of whole antibodies or tiiose syntiiesized de novo usmg 
recombinant DNA mefliodologies. It also includes polyclonal antibodies, monoclonal 
antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. 
"Fc" portion of an antibody refers to Aat portion of an immunoglobulin heavy chain 
30 that comprises one or more heavy dasan constant region domains, CHi, CHa and dh, 
but does not include the heavy chain variable region. 
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"Immunoassay" is an assay that uses an antibody to specifically bind an 
antigen {e.g.y a marker). The immunoassay is characterized by the use of specific 
binding properties of a particular antibody to isolate, target, and/or quantify the 
antigen. 

5 The phrase "specifically (or selectively) bmds" to an antibody or "specifically 

(or selectively) unmunoreactive with," when referring to a protein or peptide, refers to 
a binding reaction that is determinative of the presence of the protein in a 
heterogeneous population of proteins and other biologies. Thus, under designated 
immunoassay conditions, the specified antibodies bind to a particular protein at least 

10 two times the back^und and do not substantially bind in a significant amount to 
other protems present in the sample. Specific bmding to an antibody under such 
conditions may require an antibody that is selected for its specificity for a particular 
protein. For example, polyclonal antibodies raised to marker Br 1 fi-om specific 
species such as rat, mouse, or human can be selected to obtain only those polyclonal 

15 antibodies that are specifically immunoreactive with marker Br 1 and not with other 
protems, except for polymorphic variants and alleles of marker Br 1. This selection 
may be achieved by subtractmg out antibodies that cross-react with marker Br 1 
molecules from other species. A variety of immunoassay formats may be used to 
select antibodies specifically immunoreactive with a particular protein. For example, 

20 solid-phase ELISA immunoassays are routinely used to select antibodies specifically 
immunoreactive with a protein (see, Harlow & Lane, Antibodies, A Laboratory 
Manual (1988), for a description of immunoassay formats and conditions that can be 
used to determme specific immunoreactivity). Typically a specific or selective 
reaction will be at least twice background signal or noise and more typically more 

25 than 10 to 100 tunes background. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates to a method for identification of tumor 
biomarkers for breast cancer, with high specificity and sensitivity. 
30 Fifteen (14) biomarkers were identified that are associated with breast cancer 

disease status. The corresponding proteins or firagments of proteins for flie fourteen 
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10 



15 



20 



biomarkers are represented as intensity peaks in SELDI (surface enhanced laser 
desorption/ionization) protein chip/mass spectra with molecular masses centered 
around the following values: 



25 



30 



Marker I (BCl): 


4Zo3 uaitons 


Marker n (BC2): 


oizo ciaiions 


Marker mOSCS): 


w3Z daitons 


Marker IV: 


4465 aaltons 


Marker V: 


4060 aaltons 


Marker VI: 


8322 daitons 


Marker Vn: 


17046 daitons 


Marker Vni: 


17696 daitons 


Marker IX: 


10240 daitons 


Marker X: 


5891 daitons 


Marker XI: 


8426 daitons 


MaricerXn: 


7541 daitons 


Marker Xni: 


9413 daitons 


Marker XFV: 


16244 daitons. 


These masses for Markers I through XTV are considered accurate to within 



0.15 percent of the specified value as determined by the disclosed SELDI-mass 

spectroscopy protocol. 

We have found that Markers I through XTV are differentially present in 
quantity and/or fiequency in sample taken from patients having human breast cancer 
as compared to a control subject as follows, particularly the marker being up regulated 
in cancer patient sample (i.e. elevated level in samples of breast cancer patients 
compared to samples from control subjects) or the marker being down regulated (i.e. 
decreased level in samples of breast cancer patients compared to samples from control 
subjects): 

NfarkerlCBCl): down regulated in cancer patient sample 
Mark«- n 03C2): up regulated m cancer patient sample 
^d[arke^ m (BC3): up regulated m cancer patient sample 
Marker IV: up regulated in chancer patient sample 
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Marlcer V: 


y\r\ f<»<yiil5it*»rl in ftaiififtr natient samnle 

\X\} 1 Cj^^UlCHvV* liA V/CUAVwA |yctblwlll> ocuiA^xw 


Marker vi. 


nn rf'cnilatf^H in cATicer Da.tient samDle 


Marker vii. 


im rf^cnilatpH in ca.ncer natient samole 


Marker Vlii. 


nn r*»milatpH in cancer na^tient saniDle 


Marker DC: 


Qowii regulated in cancer paxieni sainpic 


Marker X: 


down regulated in cancer patient sample 


Marker XI: 


up regulated in cancer patient sample 


Marker XH: 


down regulated in cancer patient sample 


Marker Xni: 


up regulated in cancer patient sample 


Marker XIV: 


up regulated in cancer patient sample 



The association between Markers I throu^ XIV and the absence or presence 
of breast cancer was established m patients with breast cancer before tumor resection 
and treatment at various stages (n=103), women with benign breast diseases (n=25) 
and healliiy women (n=41) without Imown neoplastic diseases. The high specificity 
1 5 and sensitivity of the method used for identifying tiie biomarkers that differentiate 
between the different stages of breast cancer is underscored by using only three of 
these biomarkers, 4283 (BCl), 8126 (BC2) and 8932 (BC3). to correctly identify 93% 
of breast cancer patients at different stages: Stage O/I (93%), stage H (85%) and stage 
in(94%). Using only one biomarker(BC3), correct identification 85% of breast 
20 cancer patients with stage O/I (88%), stage H (78%) and stage m (92%) was achieved. 

In particular, sunultaneous analysis of protein profiles of 169 serum samples 
of subjects witii or without breast cancer using was carried out and the results 
demonstrate the high specificity and selectivity of flie methods described herem. Out 
of the 169 serum samples of subjects, three discrimmating biomarkers were identified, 
25 tiie combmation of which achieved both high sensitivity (93%) and high specificity 
(91%) m detecting breast canco: fiom tiie non-cancer controls. 

As discussed above. Markers I tiirough XIV also may be characterized based 
on aftinity for an adsorbent particularly bindmg to an immobilized Ni chelate 
OMAC) substrate sur&ce under the conditions specified undar ProteinChip Analysis 
30 of the General Comments of the Examples which follow, which conditions include 30 
jil of 8M urea, 1% CHAPS m PBS, PH 7.4 is added to a 20 jil serum sample; the 
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diluted sample is vortexed at 4"C for 15 minutes and diluted 1:40 in PBS; 
immobUized metal afOnity capture chips (IMAC3) are activated with 50mM NiS04 
according to manufecturer's instructions (Ciphergen Biosystems, Inc., CA); 50 yl of 
the diluted serum samples are applied to each spot on the ProteinChip array by usmg a 
5 96 well bioprocessor (Ciphergen Biosystems, Inc., CA); after binding at room 

temperature for 60 minutes on a platform shaker, the array is washed twice with 100 
of PBS for 5 minutes followed by two rinses with 100 \il of dH20; binding can be 
detected with a mass reader. References herein that a particular protein marker can be 
characterized as buidmg to an IMAC Ni adsorbent indicates detection of bmding of 
10 the marker wiA a serum sample processed under those conditions. 

The foUowuig ilhistrative example of the mvention for identification of 
biomarkers is not meant to limit or construe the invention in any way. To identify 
serum biomarkers with potential for early detection of breast cancer, protein profiles 
of specunens fiom cancer patients at stages 0 and I were compared to those fixjm the 
15 non-cancer controls. The selected biomarkers were then tested using data firam breast 
cancer patients at Stage H and m, which were not mcluded in the biomarker selection 
process. High-throughput profiling of compile protein expression patterns greatly 
fecilitates the screening of a large number of potential markers simultaneously. The 
UMSA algorithm provided an efficient model to rank a large number of peaks 
20 collective^ according to their contribution to the separation of two predefined 

diagnostic groups. The ProPeak Bootstrap module introduced random perturbations 
in multiple runs to test the consistency of the top-ranked peaks, measured by the 
standard deviation of computed ranks firom multiple runs. In order to establish an 
upper bound cutoff value on a peak's rank standard deviation for its performance not 
25 to be considered as purely by chance, the same bootstrap procedure was appUed to a 
randomly generated data set that sunulates Ae distribution of the real data. The 
minimum value of rank standard deviations from such "simulated peaks" indicates the 
level of consistency that a peak might achieve by random chance. This minimum 
value was used as tfie cutoff to help to reduce tiie original 147 peaks to a subset of 15 
30 top-ranked peaks for further consideration. The performance of such peaks should be 
less likely due to random arti&cts in the data. 
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For simplicity, the composite index was derived by simple multivariate 
logistic regression. When these selected biomarkers are fiirther validated, more 
complex and nonlinear classification models may be employed to combine the 
multiple biomarkers. The use of complex modelmg methods on carefully screened 
5 and tested biomarkers should in general offer a more robust performance than the 
direct application of such methods on raw data from a large number of mass peaks. 
The discriminatory power of the selected biomarkers was verified usmg stages H-m 
data as an independent test set. The bootstr^ cross-validation estimation of 
performance offers statistical confidence on the generalizability of these biomarkers 
10 over fiiture data. Of the three biomarkers selected, no significant correlation was 
found between tiie level of these maricers and the tumor size or lymph node 
metastasis. The discriminatory power of these markers is therefore most likely 
reflecting the malignant nature of the tumor rather than the progression of it. 

As used herein, "tumor stage" or "tumor progression" refers to the different 
1 5 clinical stages of the tumor. Clinical stages of a tumor are defined by various 
parameters which are well-established in the field of medicine. Some of the 
parameters include morphology, size of tumor, the degree in which it has metastasized 
tiirough the patientfs body and the like. 

Cancer in humans appears to be a multi-step process which involves 
20 progression fiom pre-malignant to maUgnant to metastatic disease which ultimately 
kills the patient. Epidemiological studies in humans have established that certain 
pathologic conditions are "pre-maUgnant" because they are associated witii increased 
risk of maUgnancy. There is precedent for detecting and eluninating pre-invasive 
lesions as a cancer prevention strategy: dysplasia and carcinoma ifh-situ of the uterine 
25 cervfat are examples of pre-maUgnancies which have been successfiiUy employed in 
tiie prevention of cervical cancer by cytologic screening methods. Unfortunately, 
because the breast cannot be sampled as readily as carvfac, the development of 
screening methods for breast pre-malignancy involves more complex approaches tiian 
cytomorphologic screening now currentiy employed to detect cervical cancer. 
30 Pre-malignant breast disease is also characterized by an apparent 

morphological progression from atypical hyperplasias, to carcinoma in^situ (pre- 
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invasive cancer) to invasive cancer which ultimately spreads and metastasizes 
resulting in the death of the patient. Careful histologic examination of breast biopsies 
has demonstrated intermediate stages which have acquired some of these 
characteristics but not others. Detailed epidemiological studies have established that 
5 different morphologic lesions progress at different rates, varying from atypical 
hyperplasia (with a low risk) to comedo ductal carcinoma-in-situ (DCIS) which 
progresses to invasive cancer in a high percentage of patients (London et al, 1991; 
Page et al, 1982; Page et al, 1985; Page et al, 1991; and Page et al, 1978). Family 
history is also an important risk fector in the development of breast cancer and 
10 increases the relative risk of these pre-malignant lesions (Dupont et al, 1985; Dupont 
et al, 1993; and, London et al, 1991). Of particular interest is non-comedo carcinoma- 
in-situ which is associated with a greater than ten-fold increased relative risk of breast 
cancer compared to control groups (Ottesen et al, 1992; Page et al, 1982). Two other 
reasons besides an increased relative risk support tiie concept that DCIS is pre- 
15 malignant: 1) When breast cancer occurs in tfiese patients it regularly occurs in the 
same region of the same breast where the DCIS was found; and 2) DCIS is frequently 
present in tissue adjacent to invasive breast canc^ (Ottesen et al, 1992; Schwartz et 
al, 1992). For these reasons DCIS very likely represents a rate-Umitmg step in the 
development of invasive breast cancer in women. 
20 L Examples of Preferred Embodiments 

In a prefened embodimrait, Ae invention provides mefliods for aiding a human 
cancer diagnosis using one or more markers, for example Markers 4283 (BCl), 8126 
^C2) and 8932 (BC3). These mariners can be used alone, m combination witii other 
markers in any set, or with entirely different maricers (e.g., CA15.3 and CA27.29) in 
25 aiding human cancer diagnosis. The markers are differentially present in samples of a 
human cancer patient, for example breast cancer patient, and a normal subject in 
whom human cancer is undetectable. For example, some of the markers are 
expressed at an elevated level and/or are present at a higher frequency in human 
cancer patients than in normal subjects. Therefore, detection of one or more of these 
30 maricers in a person would provide useful information regarding the probability that 
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the person may have human cancer and also be able to determine the clinical stage of 
the tumor. 

Accordingly, embodiments of the invention include methods for diagnosing 
and differentiating between the different malignant stages of cancer by (a) using a 
5 biochip array to generate a first set of data representative of the first set of biological 
markers; (b) evaluating the first set of data detecting at least one or more protein 
biomarkers in a subject sample; (c) correlating the detection of one or more protein 
biomarkers with a progressive malignant stage of cancer as compared to normal 
subjects. The correlation may take into account the amount of the marker or markers 
10 in the sample compared to a control amount of the marker or markers (up or down 
regulation of the marker or markers) (e.g., in normal subjects in whom human cancer 
is undetectable). The correlation may take into account the presence or absence of the 
maikers m a test sample and the frequency of detection of the same markers in a 
control. The correlation may take into account both of such factors to facilitate 
15 determination of whether a subject has a human csuicer or not 

Any suitable samples can be obtained from a subject to detect markers. 
Preferably, a sample is a blood serum sample from the subject If desired, the sample 
can be prepared as described above to enhance detectability of the markers. For 
example, to increase the detectability of markers 4283 (BCl), 8126 (BC2) and 8932 
20 ^C3), a blood serum sample from the subject can be preferably fractionated by, e.g., 
Cibacron blue agarose chromatography and single stranded DNA aflfmity 
chromatography, anion exchange chromatography and the like. Sample 
preparations, such as pre-fiactionation protocols, is optional and may not be necessary 
to enhance detectability of markers depending on the methods of detection used. For 
25 example, sample preparation may be unnecessary if antibodies tiiat specifically bind 
markers are used to detect the presence of markers in a sample. 

Any suitable mettiod can be used to detect a marker or markers in a sample. 
For example, gas phase ion spectrometry or an inununoassay can be used as described 
above. Using these methods, one or more markers can be detected. Preferably, a 
30 sample is tested for the presence of a plurality of markers. Detecting the presence of a 
plurality of markers, rather than a single marker alone, would provide more 
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infonnation for the diagnostician. Specifically, the detection of a plurality of markers 
in a sample would increase the percentage of true positive and true negative diagnoses 
and would decrease the percentage of false positive or false negative diagnoses. 
The detection of the marker or markers is then correlated with a probable 
5 diagnosis of human cancer. In some embodiments, the detection of the mere presence 
or absence of a marker, without quantifying the amount of marker, is useful and can 
be correlated with a probable diagnosis of human cancer and the determination of the 
clinical stage of the tumor. For example, use of only three of these biomarkers, 4283 
(BCl), 8126 (BC2) and 8932 (BC3), 93% of breast cancer patients were correctly 
10 identified at the following stages: Stage O/I (93%), stage H (85%) and stage m (94%). 
With only one biomarker (BC3), 85% of breast cancer patients with stage O/I (88%), 
stage n (78%) and stage HI (92%), were correctly identified. Thus, a mere detection 
of one or more of these markers in a subject being tested indicates that the subject has 
progressed to a cUfferent clinical stage of the tumor. 
15 In other embodiments, the detection of maricers can involve quantifying the 

markers to correlate the detection of markers with a probable diagnosis of human 
cancer. Thus, if the amount of the markers detected m a subject being tested is higher 
compared to a control amount, then the subject being tested has a higher probabiUfy 
of havmg a human cancer. 
20 Similarly, in another embodiment, the detection of markers can further involve 

quantifying the markers to coirelate the detection of markers with a probable 
diagnosis of human cancer wherein the maikers are present in lower quantities in 
blood serum samples fix)m human cancer patients than in blood serum samples of 
normal subjects. Thus, if the amount of the maikers detected m a subject being tested 
25 is lower compared to a control amount, tiien the subject being tested has a higher 
probability of having a human cancer. 

When the maikers are quantified, it can be compared to a control. A control 
can be, e.g., the average or median amount of marker present in comparable samples 
of nonnal subjects in whom human cancer is undetectable. The control amount is 
30 measured under the same or substantially similar experimental conditions as in 

measuring the test amount For example, if a test sample is obtained firam a subject's 
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blood serum sample and a marker is detected using a particular probe, then a control 
amount of the marker is preferably determined from a serum sample of a patient using 
the same probe. It is preferred that tiie conttol amount of marker is determined based 
upon a significant number of samples from normal subjects who do not have human 
5 cancer so that it reflects variations of the marker amounts in that population. 

Data generated by mass spectrometry can then be analyzed by a computer 
software. The software can comprise code that converts signal from the mass 
spectrometer into computer readable form. The software also can include code that 
applies an algorithm to Ae analysis of the signal to determine whether the signal 
10 represents a "peal^ in the signal corresponding to a marker of this invention, or other 
usefiil markers. The software also can include code that executes an algorithm that 
compares signal from a test sample to a typical signal characteristic of "normal" and 
human cancer and determines the closeness of fit between the two signals. The 
software also can include code indicating which the test sample is closest to, thereby 
1 S providing a probable diagnosis. 

In accordance with the present invention, the mefhods desoibed herein, pre- 
invasive or even benign tumors may be diagnosed by identifying the biomarkers 
which cause a pre-invasive tumor to progress to a malignant tumor. The type of 
biomarkers identified and amounts of biomarker may correlate with tiie jump &om a 
20 pre-invasive tumor to a malignant stage tumor. Therapy such as immediate excision 
of tiie tumor or therapies such as chemotiierapy or radiation therapy can be 
implemented prior to the tumor becoming invasive. The identification of the pre- 
invasive biomarkers can be used in diagnosis wifli conventional metfiods such as, for 
example, in breast cancer, use of mammograms. 
25 The present invention tiius provides for the immediate identification of a pre- 

invasive tumor by identifying tiie biomarkors associated witii such tumors and the 
patient may be given life-saving thenq>y. Furthermore, the costs of long term 
treatment of cancer patients will also be reduced. 

More specifically, tiie present invention is based upon, tiie discovery of 
30 protein markers that are differentially present in samples of human cancer patients and 
control subjects, and the application of this discovery in metiiods for aiding a human 
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cancer diagnosis and tumor stage progression. Some of these protein markers are 
found at an elevated level and/or more frequently in samples from human cancer 
patients compared to a control (e.g., women in whom human cancer is undetectable). 
Accordingly, the amount of one or more markers foimd in a test sample compared to a 
5 control, or the mere detection of one or more markers in the test sample provides 
usefixl information regarding probability of whether a subject being tested has human 
cancer or not. 

The protein markers of the present invention have a number of other uses. For 
example, the markers can be used to screen for compounds that modulate the 

10 expression of the markers in vitro or in vivo, which compounds in turn may be useful 
in treating or preventing human cancer in patients. In another example, markers can 
be used to monitor responses to certain treatments of human cancer. In yet another 
example, the markers can be used in the heredity studies. For instance, certain 
markers may be genetically linked. This can be determined by, e.g., analyzing 

1 5 samples from a population of human cancer patients whose families have a history of 
human cancer. The results can then be compared with data obtained from, e.g., 
human cancer patients whose families do not have a history of human cancer. The 
markers that are genetically linked may be used as a tool to determine if a subject 
whose family has a history of human cancer is pre-disposed to having human cancer. 

20 In anotiier aspect, the invention provides methods for detecting markers which 

are differentially present in the samples of a human cancer patient and a control (e.g., 
women in whom human cancer is undetectable). The markers can be detected in a 
number of biological samples. The sample is preferably a biological fluid sample. 
Examples of a biological fluid sample usefiil in this invention mclude blood, blood 

25 serum, plasma, nipple aspirate, urine, tears, saliva, etc. Because all of the markers are 
found in blood serum, blood serum is a preferred sample source for embodiments of 
the invention. 

Any suitable methods can be used to detect one or more of tiie markers 
described herein. These methods include, without limitation, mass spectrometry (e.g., 
30 laser desorption/ionization mass spectrometry), fluorescence (e.g. sandwich 
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immvmoassay), surfece plasmon resonance, ellipsometry and atomic force 
microscopy. 

In preferred embodiments, the method of resolution of involves Surface- 
Enhanced Laser Desorption/Ionization ("SELDI") mass spectrometry, in which the 

5 surface of the mass spectrometry probe comprises adsorbents that bind the markers. 
SELDI is an affmity based MS method in which proteins are selectively adsorbed to a 
chemically modified surface (ProtemChip® arrays, Ciphergen Biosystems, Inc., 
Fremont CA), and impurities are removed by washmg with buffer. By combming an 
array of different surfaces and wash conditions, high speed, high-resolution 
10 chromatographic separations are achieved on chip. M. Merchant et al.. 
Electrophoresis^ 2000;21:1164-67. 

SELDI TOF-MS offers high-throughput protein profilmg. Like many other 
types of high-throughput expression data, protein array data are often characterized by 
a large number of variables (the mass peaks) relative to a small sample size (the 

15 number of specunens). An important issue in analyzing such data to screen for 
disease-associated biomarkers is to extract as much mformation as possible from a 
lunited number of samples and to avoid selecting biomarkers whose performances are 
influenced mostly by non-disease related artifects in the data. The effective and 
£q)propriate use of bioinformatics tools becomes very critical. 

20 In other preferred embodiments, immobilized metal afiEinity ProteinChip 

arrays and SELDI to screen for potential serum biomarkers for early detection of 
breast cancer are used for high through put screening. For example, a total of 169 
retrospective serum samples from patients with or without breast cancer were 
obtained from Johns Hopkins Clinical Chemistry Serum Banks and analyzed 

25 shnultaneously . Proteins bound to the chelated metal (through histidine, tryptophan, 
cysteme or phosphorylated amino acids) were analyzed on a PBS-II mass reader 
(Ciphergen Biosystems, Inc., Fremont, CA). The complex protein profiles were 
analyzed using a collection of bioinformatics tools. A panel of three biomarkers was 
selected based on tiieir consistently significant contribution to the optimal separation 

30 of stages O-I breast cancer patients versus the non-cancer controls OHealthy + Benign). 
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The effectiveness of the selected biomarkers was then tested using independent data 
from stages n-m breast cancer patients and through bootstrap cross-validation. 

n. PREPARATION OF MARKERS 
5 Preferably, the sample is prepared prior to detection of biomarkers. Typically, 

this involves collection of a sample from a subject to be tested. The sample can be 
any biological sample &om the subject. Preferably is a biological fluid or a derivative 
thereof such as blood, plaana serum, urine, lymphatic fluid or fluid from ductal 
lavage. Most preferably, the sample is serum. 
10 It may be usefril to pre-fiactionate the sample and to collect fractions 

determined to contam the biomarkers. Methods of pre-fractionation include, for 
example, size exclusion chromatography, ion exchange chromatography, heparin 
chromatography, affinity chromatography, sequential extraction, gel electrophoresis 
and liquid chromatography. The analytes also may be modified prior to detection. 
15 These methods are useful to simplify the sample for fiirther analysis. For example, it 
can be usefiil to remove high abundance proteins, such as albumin, from blood before 
analysis. However, the maricers of tiie present mvention are detectable by SELDI 
after no more fractionation than isolating serum from blood. 

In one embodiment a sample can be pre-fiactionated accordmg to size of 
20 proteins in a sample using size exclusion chromatography. For a biological sample 
wherem tfie amount of sample available is small, preferably a size selection spin 
colunm is used. For example, a K30 spin column (available from Princeton 
Separation, Ciphergen Biosystems, Inc., cto.) can be used. In general, the first 
fraction that is eluted from the column Cfraction 1") has the highest percentage of 
25 high molecular weight proteins; fraction 2 has a lower percentage of high molecular 
weight proteins; fraction 3 has even a lower percentage of high molecular weight 
protems; fraction 4 has the lowest amount of large proteins; and so on. Each fraction 
can then be anafyzed by gas phase ion spectrometry for the detection of markers. 

In another embodiment, a sample can be pre-fractionated by anion exchange 
30 chromatography. Anion exchange chromatography allows pre-fractionation of the 
proteins in a sample roughly accordmg to flieir charge characteristics. For example, a 
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Q anion-exchange resin can be used {e.g., Q HyperD F, Biosepra), and a sample can 
be sequentially eluted with eluants having different pH's (see Figure 3 and Example 
section VI B). Anion exchange chromatography allows separation of biomolecules in 
a sample that are more negatively charged from other types of biomolecules. Proteins 
5 that are eluted with an eluant having a high pH is likely to be weakly negatively 
charged, and a j&action that is eluted with an eluant having a low pH is likely to be 
strongly negatively charged. Thus, in addition to reducing complexity of a sample, 
anion exchange chromatography separates proteins according to their binding 
characteristics. 

10 In yet anotiier embodiment, a sample can be pre-fractionated by heparin 

chromatography. Heparin chromatography allows pre-fractionation of the markers in 
a sample also on flic basis of afBnity interaction with heparin and charge 
characteristics. Heparin, a sulfeted mucopolysaccharide, will bind markers with 
positively charged moieties and a sample can be sequentiaUy eluted with eluants 
1 5 having different pH's or salt concentrations. Markers eluted with an eluant having a 
low pH are more likely to be weakly positively charged. Markers eluted with an 
eluant having a high pH are more likely to be strongly positively charged. Thus, 
heparin chromatography also reduces the complexity of a sample and separates 
markers accoiding to their binding characteristics. 
20 In yet anotiier embodiment, a sample can be pre-fractionated by removing 

proteins timt are present in a high quantity or tiiat may interfere witii tiie detection of 
markers in a sample. For example, in a blood serum sample, serum albumin is present 
in a high quantity and may obscure the anafysis of markCTS. Thus, a blood serum 
sample can be pre-fractionated by removing serum albumin. Serum albumin can be 
25 removed using a substrate that comprises adsorbents that specifically bind serum 
albumin. For example, a column which comprises, e.g., Cibacron blue agarose 
(which has a high afBnity for serum albumin) or anti-serum albumin antibodies can be 
used (see, e.g.. Figures 2 and 4). 

In yet anotiier embodiment, a sample can be pre-fractionated by isolating 
30 proteins tiiat have a specific characteristic, e.g. are glycosylated. For example, a 
blood serum sample can be firactionated by passing the sample over a lectin 
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chromatography column (which has a high af5Bnity for sugars). Glycosylated proteins 
will bind to the lectin column and non-glycosylated proteins will pass through the 
flow through. Glycosylated proteins are then eluted from the lectin column with an 
eluant containing a sugar, e.g., N-acetyl-glucosamine and are available for further 
5 analysis. 

Many types of affinity adsorbents exist which are suitable for pre-fractionating 
blood serum samples. An example of one other type of affinity chromatography 
available to pre-fi^ctionate a sample is a single stranded DNA spin column. These 
columns bind proteins which are basic or positively charged. Bound proteins are then 
10 eluted from the column using eluants containing denaturants or high pH. 

Thus there are many ways to reduce the complexity of a sample based on the 
binding properties of the proteins m the sample, or the characteristics of the protems 
in the sample. 

In yet another embodiment, a sample can be fractionated using a sequential 

15 extraction protocol. In sequential extraction, a sample is exposed to a series of 

adsorbents to extract different types of biomolecules fix)m a sample. For example, a 
sample is applied to a first adsorbent to extract certain proteins, and an eluant 
containing non-adsorbent proteins (/.e., proteins that did not bind to the first 
adsorbent) is collected. Then, the fraction is exposed to a second adsorbent. This 

20 ftirther extracts various proteins from the fraction. This second fiction is then 
exposed to a fliird adsorbent, and so on. 

Any suitable materials and methods can be used to perform sequential 
extraction of a sample. For example, a series of spin columns comprising different 
adsorbents can be used. In another example, a multi-well comprising different 

25 adsorbents at its bottom can be used. In another example, sequential extraction can be 
performed on a probe adapted for use in a gas phase ion spectrometer, wherem the 
probe sur&ce comprises adsorbents for binding biomolecules. In this embodiment, 
the sample is applied to a first adsorbent on tihe probe, which is subsequently washed 
with an eluant. Markers that do not bind to the first adsorbent are removed with an 

30 eluant. The markers that are in flie fraction can be applied to a second adsorbent on 
the probe, and so forth. Hie advantage of performmg sequential extraction on a gas 
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phase ion spectrometer probe is that markers that bind to various adsorbents at every 
stage of the sequential extraction protocol can be analyzed directly using a gas phase 
ion spectrometer. 

In yet another embodiment, biomolecules in a sample can be separated by 
5 high-resolution electrophoresis, e.g., one or two-dimensional gel electrophoresis. A 
fiaction containmg a marker can be isolated and further analyzed by gas phase ion 
spectrometry. Preferably, two-dimensional gel electrophoresis is used to generate 
two-dhnensional array of spots of biomolecules, including one or more markers. See, 
e.g., Jungblut and Thiede, Mass Spectr. Rev. 16:145-162 (1997). 
10 The two-dimensional gel electrophoresis can be performed using metiiods 

known in the art. See, e.g., Deutscher ed.. Methods In Eraymology vol. 182. 
lypicaUy, biomolecules in a sample are separated by, e.g., isoelectric focusing, during 
which biomolecules in a sample are separated in a pH gradient until they reach a spot 
where their net charge is zero {le., isoelectric point). This first separation step results 
15 in one-dunensional anay of biomolecules. The biomolecules in one dimensional 
array is further separated using a technique generally distinct from that used in the 
first separation step. For example, in the second dimension, biomolecules separated 
by isoelectric focusing are further separated using a polyaciylamide gel, such as 
polyaciylamide gel electrophoresis m the presence of sodium dodecyl sulfete (SDS- 
20 PAGE). SDS-PAGE gel allows further separation based on molecular mass of 
biomolecules. Typically, two-dimensional gel electrophoresis can separate 
chemically different biomolecules in the molecular mass range from 1000-200,000 Da 
within complex mixtures. 

Biomolecules in the two-dimensional array can be detected using any suitable 
25 methods known m the art For example, biomolecules in a gel can be labeled or 
stained (e.g., Coomassie Blue or silver stainmg). If gel electrophoresis generates 
spots Aat correspond to tfie molecular weight of one or more markers of the 
uivention, Ae spot is further analyzed by gas phase ion spectrometry. For example, 
spots can be excised from the gel and analyzed by gas phase ion spectrometry. 
30 Alternatively, the gel containing biomolecules can be transferred to an inert 
membrane by applying an electric field. Then a spot on the membrane that 
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approximately corresponds to the molecular weight of a marker can be analyzed by 
gas phase ion spectrometry. In gas phase ion spectrometry, the spots can be analyzed 
using any suitable techniques, such as MALDI or SELDI (e.g., using ProteinChip® 
array) as described in detail below. 
5 Prior to gas phase ion spectrometry analysis, it may be desirable to cleave 

biomolecules in the spot into smaller fragments using cleaving reagents, such as 
proteases (e.g., trypsin). The digestion of biomolecules into small fragments provides 
a mass fmgerprint of the biomolecules in the spot, which can be used to determme the 
identity of markers if desired. 
10 In yet another embodiment, high performance liquid chromatography (HPLC) 

can be used to separate a mixture of biomolecules in a sample based on their different 
physical properties, such as polarity, charge and size. HPLC instruments typically 
consist of a reservoh- of mobUe phase, a pump, an injector, a separation colunm, and a 
detector. Biomolecules in a sample are separated by injectmg an aliquot of the 
1 5 sample onto the cohunn. Different biomolecules in the mfacture pass through the 

column at different rates due to differences in their partitioning behavior between the 
mobile liquid phase and the stationary phase. A fraction that corresponds to the 
molecular weight and/or physical properties of one or more markers can be collected. 
The fraction can then be analyzed by gas phase ion spectrometry to detect markers. 
20 For example, tiie spots can be analyzed using either MALDI or SELDI {e.g., using 
ProteinChip® array) as described m detail below. 

Optionally, a marker can be modified before analysis to improve its resolution 
or to detennine its identity. For example, the markers may be subject to proteolytic 
digestion before analysis. Any protease can be used. Proteases, such as trypsin, tiiat 
25 are likely to cleave the markers into a discrete number of fragments are particularly 
usefiil. The fragments fbat result from digestion frmction as a fingerprint for the 
markers, thereby enabling their detection indirectly. Thb is particularly usefiil where 
tiiere are markers with similar molecular masses that might be confiised for the 
marker in question. Also, proteolytic fra^entation is usefiil for high molecular 
30 weight markers because smaller markers are more easily resolved by mass 
spectrometry. In another example, biomolecules can be modified to improve 
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detection resolution. For instance, neuraminidase can be used to remove terminal 
siaUc acid residues ftom glycoproteins to improve binding to an anionic adsorbent 
(e.g.,cationic exchange ProteinChip® arrays) and to improve detection resolution. In 

another example, the markers can be modified by the attachment of a tag of particular 
5 molecular weight that specifically bind to molecular markers, further distmguishing 
them. Optionally, after detecting such modified markers, the identity of the markers 
can be fiirther determined by matching the physical and chemical characteristics of 
the modified markers in a proteui database (e.g., SwissProt). 

10 in. CAPTURE OF MARKERS 

Biomaikers are preferably captured with capture reagents immobilized to a 
solid support, such as any biochip described herein, muWwell microtiter plate or a 
resin. In particular, the biomarkers of this invention are preferably captured on 
SELDI protein biochips. Capture can be on a chromatographic surface or a 
15 biospecific surfece. Any of the SELDI protein biochips comprising reactive surfaces 
can be used to capture and detect the biomarkers of this invention. However, the 
biomarkers of Ibis mvention bmd weU to immobilized metal chelates. Thus, the 
IMAC-3 and IMAC 30 biochips, which nitriloacetic acid flmctionalities that adsorb 
transition metal ions, such as Cu** and Ni**, by chelation, are the preferred SELDI 
20 biochips for capturing the biomaikers of this mvention. SEIXtt biochips also can be 
derivatized with the antftodies that specifically capture the biomarkers, or they can be 
derivatized with capture reagents, such as protem A or protein G that bind 
immunoglobulms. Then the biomarkers can be captured m solution using specific 
antibodies and the captured markers isolated on chip tiuough the capture reagent. 
25 In general, a sample contaming the biomarkers, such as serum, is placed on the 

active surfece of a biochip for a sufScient tune to allow bmdmg. Then, unbound 
molecules arc washed from the surfece usmg a suitable eluant, such as phosphate 
buffered salme. In general, tiie more stringent the eluant, the more tightiy the proteins 
must be bound to be retained after the wash. The retained protein biomarkers now 
30 can be detected by appropriate means. 



-39- 



wo 03/076896 




PCT/US03/06850 



IV. DETECTION AND CHARACTERIZATION OF MARKERS 
A. Spectrometry 

Analytes captured on the surface of a protein biochip can be detected by any 
method known m the art. This includes, for example, mass spectrometry, 
S fluorescence, surface plasmon resonance, ellipsometry and atomic force microscopy. 
Mass spectrometry, and particularly SELDI mass spectrometry, is a particularly useful 
method for detection of the biomarkers of this invention. Preferably, a laser 
desorption time-of-flight mass spectrometer is used in embodiments of the invention. 
Matrix-assisted laser desorption/ionization mass spectrometry, or MALDI- 

10 MS, is a method of mass spectrometry that involves the use of an energy absorbing 
molecule, frequently called a matrix, for desorbing proteins intact from a probe 
surface. MALDI is described, for example, in U.S. patent 5,1 18,937 (Hillenkamp et 
al.) and U.S. patent 5,045,694 (Beavis and Chait). In MALDI-MS the sample is 
typically mixed witii a matrix material and placed on the surface of an inert probe. 

15 Exemplaiy energy absorbing molecules include cinnamic acid derivatives, sinapinic 
acid ("SPA")» cyano hydrojqr cmnamic acid ("CHCA") and dihydro^Q^benzoic acid. 
Other suitable energy absorbing molecules are known to those skilled in this art. The 
matrix dries, forming crystals that encapsulate the analyte molecules. Then the 
analyte molecules are detected by laser desorption^onization mass spectrometry. 

20 MALDI-MS is usefiil for detecting the biomarkers of this invention if the complexity 
of a sample has been substantially reduced using the preparation methods described 
above. 

Sur&ce-enhanced laser desorption^onization mass spectrometry, or SELDI- 
MS represents an improvement over MALDI for the fractionation and detection of 

25 biomolecules, such as proteins, in complex mixtures and is a preferred method of Ifae 
present invention. SELDI is a method of mass spectrometry in which biomolecules, 
such as proteins, are captured on the sur&ce of a protein biochip using capture 
reagents that are bound there. I^ically, non-bound molecules are washed from the 
probe sur&ce before interrogation. SELDI technology is available from Ciphergen 

30 Biosystems, Inc., Fremont CA as part of the ProteinChip® System. ProteinChip® 
arrays are particularly adapted for use in SBLDL SELDI is described, for example, 
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in: United States Patent 5,719,060 ("Method and Apparatus for Desorption and 
Ionization of Analytes," Hutchens and Yip, February 17, 1998,) United States Patent 
6,225,047 ("Use of Retentate Chromatography to Generate Difference Maps," 
Hutchens and Yip, May 1, 2001) and Weinberger et aL, "Tune-of-flight mass 
5 spectrometry,'* in Encyclopedia of Analytical Chemistry, R.A. Meyera, ed., pp 1 1915- 
11918 John Wiley & Sons Chichester, 2000, 

Markers on the substrate surface can be desorbed and ionized using gas phase 
ion spectrometry. Any suitable gas phase ion spectrometers can be used as long as it 
allows markers on the substrate to be resolved. Preferably, gas phase ion 

10 spectrometers allow quantitation of markers. Preferably, markers captured on a 

protein biochip are detected using a laser desorption time-of-flight mass spectrometer, 
as described herein. 

In laser desorption mass spectrometry, a substrate or a probe comprising 
markers is introduced into an mlet system. The markers are desorbed and ionized into 

15 the gas phase by laser from the ionization source. The ions generated are collected by 
an ion optic assembly, and then in a time-of-flight mass analyzer, ions are accelerated 
through a short high voltage field and let drift into a high vacuum chamber. At the fer 
end of the high vacuum chamber, the accelerated ions strike a sensitive detector 
surface at a different time. Smce the time-of-flight is a function of the mass of the 

20 ions, the elapsed time between ion formation and ion detector impact can be used to 
identify the presence or absence of markers of specific mass to charge ratio. 

In another embodiment, an ion mobility spectrometer can be used to detect 
markers. The principle of ion mobility spectrometry is based on different mobility of 
ions. Specifically, ions of a sample produced by ionization move at different rates, 

25 due to tiieir difference in, e.g., mass, charge, or shape, through a tube imder the 
influence of an electric field. The ions (typically in the form of a current) are 
registered at the detector which can then be used to identify a marker or other 
substances in a sample. One advantage of ion mobility spectrometry is that it can 
operate at atmospheric pressure. 

30 In yet another embodiment; a total ion current measuring device can be used to 

detect and characterize markers. This device can be used when the substrate has a 
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only a single type of marker. When a single type of marker is on the substrate, the 
total current generated from the ionized marker reflects the quantity and other 
characteristics of the marker. The total ion current produced by the marker can then 
be compared to a control (e.g., a total ion current of a known compound). The 
5 quantity or other characteristics of the marker can then be determined. 

B. Immunoassay 

In another embodiment, an immunoassay can be used to detect and analyze 
markers in a sample. This method comprises: (a) providmg an antibody that 
10 specifically binds to a marker; (b) contacting a sample with the antibody; and (c) 
detecting the presence of a complex of the antibody bound to tiie marker in the 
sample. 

To prepare an antibody that specifically binds to a marker, purified markers or 
their nucleic acid sequences can be used. Nucleic acid and amino acid sequences for 
15 markers can be obtained by fiirther characterization of these markers. For example, 
each marker can be peptide mapped with a number of enzymes (e.g., trypsm, V8 
protease, etc.). The molecular weights of digestion fragments from each marker can 
be used to search the databases, such as SwissProt database, for sequences that will 
match the molecular weights of digestion fragments generated by various enzymes. 
20 Using this method, the nucleic acid and ammo acid sequences of other markers can be 
identified if these markers are known proteins in the databases. 

Alternatively, the protems can be sequenced using protein ladder sequencing. 
Protein ladders can be generated by, for example, fragmenting tiie molecules and 
subjecting firagments to enzymatic digestion or other methods that sequentially 
25 remove a single amino acid from the end of the fi:agment. Methods of preparing 

protein ladders are described, for example; in International Publication WO 93/24834 
(Chait et al.) and United States Patent 5,792,664 (Chait et aL). The ladder is then 
analyzed by mass spectrometry. The difference in the masses of tiie ladder fragments 
identify the amino acid removed Scorn the end of the molecule. 
30 If the markers are not known proteins in the databases, nucleic acid and amino 

acid sequences can be determined with knowledge of even a portion of the amino acid 
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sequence of the marker. For example, degenerate probes can be made based on the 
N-terminal amino acid sequence of the marker* These probes can then be used to 
screen a genomic or cDNA library created from a sample from which a marker was 
initially detected. The positive clones can be identified, amplified, and their 
5 recombinant DNA sequences can be subcloned using techniques which are well 
known. See, e.g.. Current Protocols for Molecular Biology (Ausubel et al. Green 
Publishing Assoc. and Wiley-Interscience 1989) and Molecular Cloning: A 
Laboratory Manual, 3rd Ed. (Sambrook et aL, Cold Spring Harbor Laboratory, NY 
2001). 

10 Using the purified markers or their nucleic acid sequences, antibodies that 

specifically bind to a marker can be prepared using any suitable methods known in the 
art. See, e.g., Coligan, Current Protocols in Immunology (1991); Harlow & Lane, 
Antibodies: A Laboratory Manual (1988); Goding, Monoclonal Antibodies: 
Principles and Practice (2d ed. 1986); and Kohler & Milstem, Nature 256:495-497 

15 (1975). Such techniques include, but are not limited to, antibody preparation by 
selection of antibodies from libraries of recombinant antibodies in phage or similar 
vectors, as well as preparation of polyclonal and monoclonal antibodies by 
immunizing rabbits or mice (see, e.g., Huse et al.. Science 246:1275-1281 (1989); 
Ward et al.. Nature 341:544-546 (1989)). 

20 After the antibody is provided, a marker can be detected and/or quantified 

using any of suitable immunological binding assays known in the art {see, e.g., U.S. 
PatentNos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168). Usefiil assays include, 
for example, an enzyme Immune assay (EIA) such as enzyme-linked immunosorbent 
assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay. 

25 These metiiods are also described in, e.g.. Methods in Cell Biology: Antibodies in Cell 
Biology, volume 37 (Asai, ed. 1993); Basic and Clinical Immunology (Stites & Terr, 
eds., 7fh ed. 1991); and Harlow & Lane, stq^ra. 

Generally, a sample obtained fit)m a subject can be contacted witii tiie 
antibody that specifically binds the marker. Optionally, tiie antibody can be fixed to a 

30 solid support to &cilitate washing and subsequent isolation of the complex, prior to 
contacting the antibody with a sample. Examples of solid supports include glass or 
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plastic in the form of, e.g,, a microtiter plate, a stick, a bead, or a microbead. 
Antibodies can also be attached to a probe substrate or ProteinChip® array described 
above. The sample is preferably a biological fluid sample taken from a subject. 
Examples of biological fluid samples include blood, serum, plasma, nipple aspirate, 
5 urine, tears, saliva etc. In a preferred embodiment, the biological fluid comprises 
blood serum. The sample can be diluted with a suitable eluant before contactmg the 
sample to the antibody. 

After incubating the sample with antibodies, the mixture is washed and the 
antibody-marker complex formed can be detected. This can be accomplished by 

10 incubating the washed mixture with a detection reagent This detection reagent may • 
be, e.g., a second antibody which is labeled with a detectable label. Exemplary 
detectable labels mclude magnetic beads (e.g., DYNABEADS™), fluorescent dyes, 
radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others 
conunonly used in an ELISA), and colorimetric labels such as colloidal gold or 

15 colored glass or plastic beads. Alternatively, the marker m the sample can be detected 
using an indirect assay, wherein, for example, a second, labeled antibody is used to 
detect bound marker-specific antibody, and/or in a competition or inhibition assay 
wherein, for example, a monoclonal antibody which binds to a distinct epitope of tiie 
marker is incubated simultaneously with the mixture. 

20 Throughout the assays, incubation and/or washing steps may be required after 

each combination of reagents. Incubation steps can vary from about S seconds to 
several hours, preferably from about 5 minutes to about 24 hours. However, the 
incubation time will depend upon the assay format, marker, volume of solution, 
concentrations and ihe like. Usually the assays will be carried out at ambient 

25 temperature, although they can be conducted over a range of temperatures, such as 
10**Cto40«C. 

Immunoassays can be used to determine presence or absence of a marker in a 
sample as well as the quantity of a marker in a sample. First, a test amount of a 
marker in a sample can be detected using the immunoassay methods described above. 
30 If a marker is present in the sample, it will form an antibody-marker complex with an 
antibody tiiat specifically binds the marker under suitable incubation conditions 
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described above. The amount of an antibody-marker complex can be determined by 
comparing to a standard. A standard can be, e.g,^ a known compound or another 
protein known to be present in a sample. As noted above, the test amount of marker 
need not be measured in absolute units, as long as the imit of measurement can be 
5 compared to a control. 

The methods for detecting these markers in a sample have many applications. 
For example, one or more markers can be measured to aid human cancer diagnosis or 
prognosis. In another example, the methods for detection of the markers can be used 
to monitor responses in a subject to cancer treatment. In another example, the 
10 methods for detecting markers can be used to assay for and to identify compounds 
that modulate expression of these markers in vivo or in vitro. In a preferred example, 
the biomarkers are used to differentiate between the different stages of tumor 
progression, thus aiding in determining appropriate treatment and extent of metastasis 
of the tumor. 

15 

V. Data Analysis 

Data generation in mass spectrometry begins with the detection of ions by an 
ion detector. A typical laser desorption mass spectrometer can employ a nitrogen 
laser at 337.1 nm. A useful pulse width is about 4 nanoseconds. Generally, power 

20 output of about U25 J is used. Ions that strike the detector generate an electric 
potential that is digitized by a high speed time-array recording device that digitally 
captures the analog signal. Ciphergen's PtoteinChip® system employs an analog-to- 
digital converter (ADC) to accomplish this. The ADC mtegrates detector output at 
regularly spaced time intervals into time-dependent bins. The time intervals ^ically 

25 are one to four nanoseconds long. Furthermore, the time-of-flight spectrum 

ultimately analyzed typically does not represent the signal from a single pulse of 
ionizmg energy against a sample, but rather the sum of signals from a nimiber of 
pulses. This reduces noise and increases dynamic range. This time-of-flight data is 
then subject to data processing. In Ciphergen's ProteinChip® software, data 

30 processing typically includes TOF-to-MZZ transformation, baseline subtraction, high 
frequency noise filtering. 
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TOF-to-M/Z transformation involves the application of an algorithm that 
transforms times-of-flight into mass-to-charge ratio (M/Z). In this step, the signals 
aie converted from tiie time domain to the mass domain. That is, each time-of-flight 
is converted into mass-to-charge ratio, or M/Z. Calibration can be done internally or 
5 externally. In internal calibration, the sample analyzed contains one or more analytes 
of known M/Z. Signal peaks at times-of-flight representing these massed analytes are 
assigned the known M/Z. Based on these assigned M/Z ratios, parameters are 
calculated for a mathematical fimction that converts times-of-flight to M/Z. In 
external calibration, a fimction that converts times-of-flight to M/Z, such as one 
10 created by prior internal calibration, is applied to a time-of-flight spectrum wifliout 
tiie use of internal calibrants. 

Baseline subtraction improves data quantification by eliminating artificial, 
reproducible instrument of&ets that perturb tiie spectrum. It involves calculatmg a 
spectrum baseline using an algoitttmi that incorporates parameters such as peak width, 
15 and then subtracting the baseline from the mass spectrum. 

ffigh frequency noise signals are eliminated by the ^plication of a smoothing 
function. A ^ical smoothing fimction applies a moving average fimction to each 
time-dependent bin. In an improved version, the moving average filter is a variable 
widfli digital filter in which the bandwidtti of the filter varies as a fimction o^ e.g., 
20 peak bandwidth, generally becoming broader with increased time-of-flig^t. See, e.g., 
WO 00/70648, November 23, 2000 (Gavin et al., "Variable Width Digital Filter fi>r 

Time-of-flight Mass Spectrometry"). 

A computer can transform the resulting spectrum into various formats for 
displaying. In one format, referred to as "spectrum view or retentate map," a standard 

25 spectral view can be displayed, viiierein the view depicts tiie quantity of analyte 

reaching the detector at each particular molecular weight In another formal referred 
to as "peak map," only tiie peak height and mass information are retained from tiie 
spectrum view, yielding a cleaner image and enabling analytes with nearly identical 
molecular wei^ts to be more easily seen. In yet another format, refenred to as "gel 

30 view," each mass from tiie peak view can be converted mto a grayscale image based 
on the height of each peak, resulting m an appearance similar to bands on 
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electrophoretic gels. In yet another fonnat, referred to as "3-D overlays," several 
spectra can be overlaid to study subtle changes in relative peak heights. In yet another 
format, referred to as "difference map view," two or more spectra can be compared, 
conveniently highlighting unique analytes and analytes that are up- or down-regulated 
5 between samples. , 

Analysis generally mvolves the identification of peaks in the spectrum that 
represent signal from an analyte. Peak selection can, of course, be done by eye. 
However, software is available as part of Ciphergen's ProteinChip® software that can 
automate the detection of peaks. In general, this software fiinctions by identifying 
10 signals having a signal-to-noise ratio above a selected threshold and labeling the mass 
of file peak at the centroid of tiie peak signal. In one usefiil application many spectra 
are comjmed to identify identical peaks present in some selected percentage of the 
mass spectra. One version of this software clusters all peaks appearing in flie various 
spectra within a defined mass range, and assigns a mass QATZ:) to all the peaks that are 
1 5 near the mid-point of tbie mass (M/Z) cluster. 

Peak data fimm one or more spectra can be subject to fiirther analysis by, for 
example, creating a spreadsheet in which each row represents a particular mass 
spectrum, each column represents a peak m the spectra defined by mass, and each cell 
includes the intensity of the peak in that particular spectrum. Various statistical or 
20 pattern recognition £q>proaches can applied to tiie data. 

The spectra tiiat are generated in embodiments of the invention can be 
classified using a pattern recognition jxrocess that uses a classification model. In 
general, the spectra will represent samples fiom at least two different groups for 
which a classification algoritimi is sought For example, the groups can be 
25 pathological v. non-patiiolo^cal (e.g., cancer v. non-cancer), drug responder v. drug 
non-responder, toxic response v. non-toxic response, progressor to disease state v. 
non-progressor to disease state, phenofypic condition present v. phenotypic condition 
absent. 

In some embodiments, data do-ived fiom the spectra (e.g., mass spectra or 
30 time-of-flight spectra) that are generated using samples such as "known samples'* can 
then be used to "train" a classification model. A "known sample" is a sample tiiat is 
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pre-classified. The data that are derived from the spectra and are used to form the 
classification model can be referred to as a "training data set". Once trained, the 
classification model can recognize patterns in data derived from spectra generated 
using unknown samples. The classification model can then be used to classify the 
5 unknown samples into classes. This can be useful, for example, in predicting whether 
or not a particular biological sample is associated with a certain biological condition 
(e.g., diseased vs. non diseased). 

The training data set that is used to form the classification model may 
comprise raw data or pre-processed data. In some embodiments, raw data can be 
10 obtamed directly from time-of-flight spectra or mass spectra, and then may be 
optionally •'pre-processed" as described above. 

Classification models can be formed using any suitable statistical 
classification (or "learning") method that attempts to segregate bodies of data into 
classes based on objective parameters present in the data. Classification methods may 
15 be either supervised or unsupervised. Examples of supervised and unsupervised 
classification processes are described in Jain, "Statistical Pattem Recognition: A 
Review", IEEE Transactions on Pattem Analysis and Machine Intelligence, Vol. 22, 

No. 1, January 2000. 

In supervised classification, training data containing examples of known 

20 categories are presented to a learning mechanism, which learns one more sets of 

relationships that define each of the known classes. New data may then be applied to 
the learning mechanism, which then classifies the new data using tiie learned 
relationships. Examples of supervised classification processes include linear 
regression processes (e.g., multiple linear regression (MLR), partial least squares 

25 (PLS) regression and principal components regression (PCR)), binary decision trees 
(e.g., recursive partitioning processes such as CART - classification and regression 
trees), artificial neural networks such as baclqjropagation networks, discriminant 
analyses (e.g., Bayesian classifier or Fischer analysis), logistic classifiers, and support 
vector classifiers (siqiport vector machines). 

30 A preferred supervised classification mefliod is a recursive partitioning 

process. Recursive partitioning processes use recursive partitioning trees to classify 
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spectra derived from unknown samples. Further details about recursive partitioning 
processes are in U.S. Provisional Patent AppUcation Nos. 60/249,835, filed on 
November 16, 2000, and 60/254,746, filed on December 11, 2000, and U.S. Non- 
Provisional Patent Application Nos. 09/999,081, filed November 15, 2001, and 
5 10/084,587, filed on February 25, 2002. All of these U.S. Provisional and Non 

Provisional Patent Applications are herein incorporated by reference in their entkety 
for all purposes. 

In otfier embodiments, the classification models that are created can be formed 
using unsupervised learning methods. Unsupervised classification attempts to learn 
10 classifications based on similarities in the traming data set, without pre classifying the 
spectra from which the trauung data set was derived. Unsupervised learning methods 
mclude cluster analyses. A cluster analysis attempts to divide the data into "clusters" 
or groups that ideally should have members that are very similar to each other, and 
very dissimUar to members of other clusters. Similarity is then measured using some 
15 distance metric, which measures the distance between data items, and clusters 

together data items that are closer to each other. Clustering techniques include the 
MacQueen's K-means algorithm and flie Kohonen's Self-Organizing Map algorithm. 

The classification models can be formed on and used on any suitable digital 
computer. Suitable digital computers mclude micro, mmi, or large computers using 
20 any standard or specialized operating system such as a Unix, Windows™ or Linux™ 
based operating system. The digital computer that is used may be physically separate 
fiom the mass spectrometer that is used to create the spectra of mterest, or it may be 
coupled to the mass spectrometer. 

Tlie tnuning data set and the classification models according to embodiments 
25 of the mvention can be embodied by computer code that is executed or used by a 

distal computer. The computer code can be stored on any suitable computer readable 
media including optical or magnetic disks, sticks, tapes, etc., and can be written in any 
suitable computer programming language mcluding C, C++, visual basic, etc. 

Data generated by desorption and detection of markers can be analyzed usmg 
30 any suitable means. In one embodiment, data is analyzed witiitiie use of a 

programmable digital computer. The computer program generally contains a readable 
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medium that stores codes. Certain code can be devoted to memory that includes the 
location of each feature on a probe, the identity of the adsorbent at that feature and the 
elution conditions used to wash the adsorbent. The computer also contains code that 
receives as input, data on the strength of the signal at various molecular masses 
5 received from a particular addressable location on the probe. This data can indicate 
the number of markers detected, including the strength of the signal generated by each 
marker. 

Data analysis can include the steps of determining signal strength (e.g., height 
of peaks) of a marker detected and removing "outliers" (data deviating from a 

10 predetermined statistical distribution). The observed peaks can be normalized, a 

process whereby tfie height of each peak relative to some reference is calculated. For 
example, a reference can be background noise generated by instrument and chemicals 
(e.g., energy absorbing molecule) which is set as zero in the scale. Then the signal 
strength detected for each marker or other biomolecules can be displayed in the form 

15 of relative intensities in the scale desired (e.g., 100). Altematively, a standard (e.g., a 
serum protein) may be admitted with the sample so that a peak from the standard can 
be used as a reference to calculate relative intensities of the signals observed for each 
marker or other markers detected. 

The computer can transform the resulting data into various formats for 

20 displaying. In one format, referred to as "spectrum view or retentate map," a standard 
spectral view can be displayed, wherein the view depicts the quantity of marker 
reaching the detector at each particular molecular weight. In another format, referred 
to as "peak map/' only the peak height and mass information are retained from the 
spectrum view, yielding a cleaner image and enabUng maikers witii nearly identical 

25 molecular weights to be more easily seen. In yet another formal referred to as "gel 
view," each mass from the peak view can be converted into a grayscale image based 
on the height of each peak, resulting in an appearance similar to bands on 
electrophoretic gels. In yet another format, referred to as "3-D overlays," several 
spectra can be overlaid to study subtie changes in relative peak heights. In yet another 

30 formal referred to as "difference map view," two or more spectra can be compared, 
conveniently highlighting unique markers and markers which are up- or down- 
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regulated between samples. Marker profiles (spectra) from any two samples may be 
compared visually. In yet another format, Spotfire Scatter Plot can be used, wherein 
markers that are detected are plotted as a dot in a plot, wherein one axis of the plot 
represents the apparent molecular of the markers detected and another axis represents 
5 the signal mtensity of markers detected. For each sample, markers that are detected 
and the amount of markers present in the sample can be saved in a computer readable 
medium. This data can then be compared to a control (e.g., a profile or quantity of 
markers detected in control, e.g., women in whom human cancer is undetectable). 

10 V. DETERMINATION OF SUBJECT STATUS 

Any biomarker, individually, is use&l in aiding in the determination of breast 
cancer status. First, the selected biomarker is measured in a subject sample using the 
methods described herein, e.g., capture on a SELDI IMAC Ni biochip followed by 
detection by mass spectrometry. Then, the measurement is then compared with a 

15 diagnostic amount or control that distinguishes a breast cancer status from a non- 
cancer status. The diagnostic amount will reflect the information herein that a 
particular biomarker is up-regulated or down-regulated in a cancer status compared 
with a non-cancer status. As is well understood in the art, the particular diagnostic 
amount used can be adjusted to increase sensitivity or specificity of the diagnostic 

20 assay depending on the preference of the diagnostician. The test amount as compared 
witii the diagnostic amount thus indicates breast cancer status. 

While individual biomaikers are useful diagnostic markers, it has been found 
that a combination of biomarkers provides greater predictive value than single 
markers alone. More particularly. Markers BCl, BC2 and BC3 are the most highly 

25 discriminatory biomarkers, used either alone or in combmation. 

In order to use the biomaricers in combination, a logistical regression 
algorithm is usefiil. The UMSA algorithm is particularly useful to generate a 
diagnostic algorithm firom test data. This algorithm is disclosed in Z. Zhang et al.. 
Applying classification separability analysis to microaaiy data. In: Lin SM, Johnson 

30 KF, eds. Methods of Microarray data analysis: papers from CAMDA '00. Boston: 
Kluwer Academic Publishers, 2001:125-136; and Z. Zhang et al.. Fishing E?q)edition 
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- a Supervised Approach to Extract Patterns from a Compendixim of Expression 
Profiles. In Lin SM, Johnson, KF, eds- Microarray Data Analysis H: Papers from 
CAMDA 'OL Boston: Kluwer Academic Publishers, 2002. 

The learning algorithm will generate a multivariate classification (diagnostic) 
5 algoriAm tuned to the particular specificity and sensitivity desired by the operator. 
The classification algorithm can then be used to determine breast cancer status. The 
method also involves measuring the selected biomarkers in a subject sample (e.g.. 
Marker BCl, BC2 and BC3). These measurements are submitted to the classification 
algorithm. The classification algorithm generates an indicator score that indicates 

1 0 breast cancer status. 

The following examples are offered by way of illustration, not by way of 
limitation. While specific examples have been provided, the above description is 
illustrative and not restrictive. Any one or more of the features of the previously 
described embodiments can be combined in any manner with one or more features of 

1 5 any other embodiments in the present invention. Furthermore, many variations of the 
invention will become apparent to those skilled in the art upon review of the 
specification. The scope of the invention should, therefore, be determined not with 
reference to the above description, but instead should be determined with reference to 
the appended claims along with their fiill scope of equivalents. 

20 All publications and patent documents cited in this application are 

incorporated -by reference in their entirety for all purposes to the same extent as if 
each individual publication or patent document were so individually denoted. By 
their citation of various references in this document. Applicants do not admit any 
particular reference is '*prior artf' to their invention. 

25 

EXAMPLES 

GENERAL COMMENTS 

In the following Examples, the foUowmg Materials and Methods were used. 

30 
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Samples. 

Retrospective serum samples were obtained from the Johns Hopkins Clinical 
Chemistry serum banks, according to the approved protocol by the Johns Hopkins 
Joint Committee on Clinical Investigation. A total of 169 specimens were included in 
5 this study. The cancer group consisted 103 serum samples from breast cancer patients 
at different clinical stages: Stage 0 (n=4). Stage I (n=38), Stage II (n=37) and Stage m 
(n=24). Diagnoses were pathologically confirmed and specimens were obtained prior 
to treatment. Age information was not available on sfac of these patients. The median 
age of the remaining 96 patients was 56 years, ranging from 34 to 87 years. The non- 
10 cancer control group included serum from 25 with benign breast diseases (BN) and 41 
healthy women (HC). Exact age information was not available from 21 healthy 
women. The median age of the remaining 20 healthy women was 45 years, ranging 
from 39 to 57 years. The median age of the benign condition group was 48 years with 
range between 21 and 78 years. All samples were stored at -80°C until use. 
15 ProteinChip Analysis. 

To 20 ^1 of each serum sample, 30 jil of 8M urea, 1% CHAPS in PBS, PH 7.4 
was added. The mixture was vortexed at 4**C for 15 minutes and diluted 1:40 in PBS. 
Immobilized metal afSnity capture chips (IMAC3) were activated with 50niM NiS04 
accordmg to manufecturer's instructions (Ciphergen Biosystems, Inc., CA). 50 \i\ of 
20 diluted samples were applied to each spot on the ProtemChip array by using a 96 well 
bioprocessor (Ciphergen Biosystems, Inc., CA), After binding at room temperature 
for 60 minxites on a platform shaker, the array was washed twice with 100 fil of PBS 
for 5 minutes followed by two quick rinses with 100 \x\ of dH20. After air-drying, 0.5 
\i\ of saturated smapinic acid (SPA) prepared in 50% acetonitrile, 0.5% trifluoroacetic 
25 acid was applied twice to each spot. Proteins bound to the chelated metal (through 
histidine, tryptophan, cysteine or phosphorylated amino acids) were detected on a 
PBS-n mass reader. Data was collected by averaging 80 laser shots with an intensity 
of 240 and a detector sensitivity of 8. Reproducibility was estimated using two 
representative serum samples, one from the healthy controls and one from the cancer 
30 patients. Each serum sample was spotted on all 8 bait surfaces of one IMAC-Ni chip 
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in each of the two bioprocessors. Coefficience of variance was estimated for the 
selected mass peaks. 
Bioinformatics and biostatistics. 

Qualified mass peaks (S/N > 5, cluster mass window at 0.3%) with M/Z 
5 between 2K and 1 50K were selected and the peak intensities were normalized to the 
total ion current using ProteinChip Software 3.0 (Ciphergen Biosystems, Inc., CA). 
Further preprocessing steps included logarithmic transformation applied to the peak 
intensity data in order to obtain a more consistent level of data variance across the 
entire range of spectrum of interest (M/Z 2 kD- 150 kD). 
10 The software package ProPeak (3Z Informatics, SC) was used to compute and 

rank the contribution of each individual peak towards the optimal separation of two 
diagnostic groups. ProPeak implements the linear version of the Unified Maximum 
Separability Analysis (UMSA) algorithm that was first reported for use in microarray 
data analysis. Z. Zhang et al.. Applying Classification Separability Analysis to 
15 Microarray Data, m Vxoc. of Critical Assessment of Techniques for Microarray Data 
Analysis (CAMDA'OO), Kluwer Academic Publishers, 2001 . The key feature of the 
UMSA algorithm is the mcorporation of data distribution information into a structural 
risk minimization-leammg algorithm (Vapnik VN, Statistical Learning Theory, John 
Wiley & Sons, Inc., New York, 199814) to identify a direction along which the two 
20 classes of data are best separated. This direction is represented as a linear 

combination (weighted sum) of the origmal variables. The weight assigned to each 
variable in this combination measures the contribution of the variable towards the 
separation of the two classes of data. 

ProPeak offers three UMSA based analytical modules. The first is a 
25 Component Analysis module, which projects each specimen as an individual point 
onto a tiiree-dimensional component space. The components (axes) are liner 
combinations of the origmal spectrum peak intensities. The axes correspond to 
directions along which two pre-specified groups of data achieve maximum 
separability. The separation between the two groups of data can be inspected in an 
30 interactive 3D display. The second module is Stepwise Selection, which uses a 

backward stepwise selection process to apply UMSA to compute a significance score 
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for individual peaks and rank them according to their collective contribution towards 
the maximal separation of the two pre-specified groups of data. A positive or 
negative score indicates a relatively elevated or decreased expression level of the 
corresponding mass peak for the diseased group whereas the absolute value of the 

5 score represents its relative importance towards data separation. To avoid selecting 
peaks based on only unrelated artifects in the data, the third module of ProPeak, 
Bootstrap, uses a boot strap procedure to repeat UMSA for multiple runs each time 
randomly leaving out a fixed percentage of the samples from both groups. The 
median and mean ranks and the corresponding standard deviation are estimated for 
10 each peak. A potential biomarker should be a peak of top median and mean ranks and 
a minimum rank standard deviation. As a way to establish an objective selection 
criterion, the same bootstn^ procedure was also applied to a random dataset that peak 
by peak simulate the distribution of fte actual data. Results from the actual data are 
compared against the ones from the simulated data to establish a statistically 

15 appropriate cutofif value on rank standaid deviation for selecting peaks with consistait 
performance. 

Example 1 

Identification ofBiomarkers that Detect Breast Cancer at the Early Stages 
20 In Older to identify potential biomarkers that can detect breast cancer at early 

stages, protein i»ofiles of specimens from stages O-I breast cancer patients were 
compared against those of tiie non-cancer controls. The analysis was performed in 
multiple iterations using all tiiree modules in ProPeak. Through tiiis iterative process 
the original full spectrum was reduced to a small subset of mass peaks that had 
25 consistentiy demonstrated a hi^ level of significance in the optimal separation 
between the two selected diagnostic groups. 

Once a small panel of biomarkers was selected, tiieir ability to detect breast 
cancer was independentiy tested using data from stages n and m cancer patients. 
Based on the entire data set, a composite index was derived using multivariate logistic 
30 regression. Descriptive statistics including p-values from two-sample t-tests were 
estimated. Receiver-operating-characteristic (ROC) curve analysis was then 
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perfonned on the selected biomarkers and the composite index. Perfonnance criteria 
such as sensitivities and specificities of the composite index were estimated using a 
bootstrap procedure. Efron B and Tibshirani R. Bootstrsq) Methods for Standard 
Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical 
5 Science. 1986;1:54-75. In this procedure, the total patient data set was divided 
through random re-sampling into a traming set to derive a composite index through 
logistic regression and a test set for computmg sensitivities and specificities. This re- 
sampling process was repeated many times. The results fi-om multiple runs were 
finally aggregated to form the bootstrap estimate of the sensitivities and specificities. 

10 

Example 2 

Peak Detection and Data Preprocessing 

Serum piotems retained on the IMAC-Ni^* chips were analyzed on a PBS-II 

mass reader. A total of 147 qualified mass peaks (S/N > 5, cluster mass window at 
15 0.3%) witfi M/Z over 2 KD were selected. Peaks of M/Z less than 2 KD are excluded 

to elimmate interference fiom the matrix. Mass accuracy of 0.1% was achieved by 

external calibration using All In 1 Protein Standard (Ciphergen Biosystems, Inc., CA). 

A representative spectrum obtained fiwm such analysis is shown m Figure 2. 

Logarithmic transformation was applied to the peak intensity values. The plots m 
20 Figure 3 illustrate the efTect of variance reduction and equalization through 

logaritiunic transformation. 

Example 3 

Biomarker Selection Based on Early-Stage Cancer and Non-cancer Controls 
25 To identify biomarkers with potential for early detection of breast cancer, 

UMSA was performed using early-stage cancer as the positive grot^ (Stage O-I, 
n=42) and tiie non-cancer controls (HC+BN, n=66) as the negative group. 
Separability between tiie two groups was first tested using UMSA derived Uner 
combmation of all 147 mass peaks. The early-stage cancer was separable fix)m tiie 
30 non-cancer group when the entire protein profiles were compared. Figure 4A plots 
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the early-stage cancer (lighter) versus non-cancer (darker) in the UMSA component 
3D space. 

To select biomarkers that consistently perform well, UMSA were applied 
repeatedly for a total of 100 runs each with 30% leave out rate using the ProPeak 

5 Bootstrap module. The same procedure was also applied to a simulated random data 
set. The minimal standard deviation derived from the simulated data was 7. In the 
experimental data, 15 mass peaks had standard deviation less then this value. This 
subset of mass peaks was selected as candidate biomarkers for further analysis. Their 
mean ranks and ttie corresponding standard deviations are plotted in Figure 4. 

10 To further rank the peaks in this reduced set of candidate biomarkers, the 

Stepwise Selection module of ProPeak was applied. The absolute value of the relative 
significance scores of the 15 peaks (see Table 4) are plotted in descending order in 
Figure 8A, which shows that the majority of separability between the two groups of 
data was contributed by the first six peaks. Among these six peaks, four are unique. 

15 The other two were identified as doubly charged forms of the two of the unique peaks 
using ProteinChip Software 3.0. The recognition of both the doubly charged and the 
singly charged forms of the peaks suggests their importance in discriminating the 
selected two diagnostic groups. Taking away the doubly charged forms, the four 
unique peaks were recombined and evaluated using Stepwise Selection again. The 

20 recalculated relative significance scores are plotted in Figure 6B. The top-scored 
three peaks, designated BCl, BC2, and BC3, were finally selected as the potential 
biomarkers for detection of breast cancer. BCl appeared down regulated (scored 
negative) while BC2 and BC3 appeared up regulated (scored positive). A 3D-plot of 
stages O-I breast cancer versus the non-cancer controls using these three biomarkers is 

25 shown in Figure 4B. 

Example 4 

Evaluation of the Selected Biomarkers 

The descriptive statistics of these three biomarkers are listed in Table 1. 
30 Figure 7 shows results from the ROC analysis. Among the three biomarkers, BC3 
demonstrated the most individual diagnostic power. Its distributions over the 
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diagnostic groups including clinical stages of cancer patients are plotted in Figure 8A. 
The sensitivities and specificities of using BC3 alone at a cutoff value of 0.8 to 
differentiate the diagnostic groups are listed in Table 2A. 

The estimated CV of tiie log transformed peak mtensity was 6% for BCl, 7% 
5 for BC2, and 13% for BC3 (data not shown). Among the three biomarkers, BC3 had 
the largest CV of 13%. In comparison, the mean value of BC3 in the cancer patients 
was almost 90% above that in the non-cancer controls (calculated based on data in 
Table 1). 

10 Table 1. Descriptive statistics of BCl, BC2, Ba, and the logistic regression derived composto 
index. Differences between non-cancer controls and stages O-I, and between non-cancer controls and 
stages n-ni, are both statisticaUy significant (p<0,000001) for aU three biomaikws and flie composite 
index. 





Non-cancer Controls 
(n=66) 


Breast Cancer Patients 
Stages 0-1 (n=42) 


Breast Can< 
Stages II- 


:er Patients 
III (n=51) 




Mean 


Stdev 


Mean 


Stdev 


Mean 


Stdev 


BC1 


0.302 


0.312 


-0.118 


0.244 


-0.081 


0.258 


BC2 


0.981 


0.358 


1.411 


0.154 


1.295 


0.205 


BC3 


0,626 


0.252 


0.993 


0.193 


1.003 


0.234 


Comp. 
Index 


-0.375 


0.313 


0.425 


0.257 


0.349 


0.242 



Example 5 

Combined Use of Three Selected Biomarkers 

Figure 9 compares fbs distribution of cancer patients at all clinical stages 
against non-cancer controls in all pair-wise biomatker combinations. Based on this 
20 observation, multivariate logistic regression was used to combine the three selected 
biomarkers to form a single-valued composite index. The descriptive statistics of tiiis 
composite index are appended in Table 1. Its distributions over tiie various diagnostic 
group are plotted in Figure 8B. ROC curve analysis of tiie composite index gave a 
much-improved AUC compared to the ones from individual biomarkers (Figure 7). 
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Bootstrap cross-validation was used to estimate the diagnostic performance of 
the composite index (20 runs; m each run, 70% samples were randomly selected for 
composite mdex derivation and the remammg 30% for testing). The estimated 
sensitivities and specificities are listed in Table 2B. 
5 The levels of the three potential biomarkers were also evaluated in relation to 

pT (tumor size) and pN (lymph node metastasis) categories. No significant 
correlation was observed. 



Table 2 A. Diagnostic performance of BC3 . 



Cutoff=0.8 


Non-Cancer Controls 


Breast Cancer Patients 


Sta 


ge 






HC 


Benign 


Subtotal 


0-1 


II 


III 


Subtotal 


Positive 


0 


6 


6 


37 (88%) 


29 (78%) 


22 (92%) 


88 (85%) 


Negative 


41 (100%) 


19(76%) 


60 (91%) 


5 


8 


2 


15 


Total 


41 


25 


66 


42 


37 


24 


103 



10 



Table 2B. Bootstrap estimated diagnostic performance of logistic regression derived 
composite index using BCl, BC2 and BC3 (20 runs, leave out rate = 30%). 



LRat 
cutofNO 


Non-Cancer Controls 


Breast Cancer Patients 


Stage 






HC 


Benign 


Subtotal 


0-1 


It 


111 


Subtotal 


Positive 








93% 


85% 


94% 


93% 
(85-100%) 


Negative 


100% 


85% 


91% 
(82-100%) 











15 
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Example 6 

Detecting breast carcinoma in situ by serum proteomic analysis using ProteinChip® 
arrays and SELDI-mass spectrometry 

The protein profiles of 169 serum samples of women with and without breast 
5 cancer were analyzed, and a panel of three proteins (8.9 KD, 8.1 KD, 4.3 KD) were 
identified, that in combined use can detect breast cancer witii high sensitivity (Stage 
O-m, 93%) and specificity (Healthy Control + Benign, 91%). Among the three 
markers, the 8.9KD protein performed the best. A sensitivity of 85%, and a specificity 
of 91% were achieved. 
10 Ductal and Lobular Carcmoma Li Situ pCIS and LCIS) are the earliest forms 

(Stage 0) of non-invasive breast cancer. Nearly 100% of women diagnosed at this 
early stage of breast cancer can be cured. To validate these markers for early 
detection of breast cancer, the performance of the 3 previously identified biomarkers 
were evaluated using sera collected by a coUaboratmg institution. The sample cohort 
15 consisted of 17 women with DCIS, 1 with LCIS, 8 with benign breast diseases, and 40 
age-matched apparently healthy controls (45-65 years). Protein profiles were 
generated in triplicates using IMAC-Ni (Inmiobilized Metal AfiFmity Capture) 
ProteinChip arrays under tfie same experhnental conditions as described supra. Log 
relative intensities of each of the three proteins were compared between different 
20 diagnostic groups using two-sample Mest. The expression patterns of two (8.9 KD 
and 8.1KD) of the three markers were consistent with previous results. The;?- values 
and the areas under the ROC curves of these two biomarkers are summarized in Table 
3. 
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Table 3 

Summary of Statistical Anatysis 





Two-sample t-test 
p-value 


Area under the 
ROC-curve 


Diagn 


ostic perfom 


Lance 




DCIS/HC 


DCIS/HC+BN 


DCIS/HC 


DCIS/HC+BN 


oensinvi^ 
(DCIS) 


C n o J* 1 fi (*i^V 

ispcciiicii'y 
(HC) 


(HC+BN) 


8.9 KD 


0.000059 


0.000072 


0.80 


0.76 


72% 
(13/18) 


65% 
(26/40) 


63% 
(30/48) 


8.1 KD 


0.0180 


0.0194 


0.76 


0.71 


61% 
(11/18) 


75% 
(30/40) 


75% 
(36/48) 



5 DdS, Ductal Carcinoma In Situ; LOS, Lobular Carcinoma In Situ; HC, Healthy Control; BN, Benign 



The following specific references also are incorporated by reference herein. 

1 . Jemal A, Thomas A, Murray T, Thun M. Cancer statistics, 2002. CA Cancer J 
10 Clin. 2002;52:23-47. 

2. National Cancer Institute. Cancer Net PDQ Cancer Information Summaries. 
Monographs on "Screening for breast cancer." http://cancer net.nci.nih.gov/pdq.html 
(Updated January 2001). 
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3, Antman K, Shea S. Screening mammography under age 50. JAMA. 
1999;281:1470-2. 

4. Chan DW, Beveridge RA, Muss H, Fritsche HA, Hortobagyi G, Theriault R, et al. 
5 Use of Truquant BR Radioimmunoassay for early detection of breast cancer 

recurrence in patients with stage n and stage in disease, J Clin. Oncology. 
1997;15:2322-2328. 

5. Karas M, Hillenkamp F. Laser desorption ionization of proteins with molecular 
10 masses exceeding 10,000 daltons. Anal Chem. 1988;60:2299-2301 . 

6. Hutchens TW, Yip TT- New desorption strategies for the mass spectrometric 
analysis of micromolecules. Rapid Conmiun. Mass Spectrom, 1993;7:576-80. 

15 7. Merchant M, Weinberger SR. Recent advancements in surfece-enhanced laser 
desorption/ionization-time of flight-mass spectrometiy. Electrophoresis. 
2000;21:1164-67. 

8. Wright Jr GL, Cazares LH, Leung S-M, Nasim S, Adam B-L, Yip T-T, et al. 
20 ProteinChip® surface enhanced laser desorption/ionization (SELDI) mass 

spectrometry: a novel protem biochip technology for detection of prostate cancer 
biomarkers in complex protein mixtures. Prostate Cancer Prostate Dis. 1999;2:264' 
76. 

25 9. Hlavaty JJ, Partin AW, Kusinitz F, Shue MJ, Stieg M, Bennett K, Briggman JV. 
Mass spectroscopy as a discovery tool for identifying serum markers for prostate 
cancer. Clin. Chem. [Abstract]. 2001;47:1924-26. 

10. Paweletz CP, Trock B, Pennanen M, Tsangaris T, Magnant C, Liotta LA, et al. 
30 Proteomic patterns of nipple aspirate fluids obtained by SELDI-TOF: potential for 
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new biomarkers to aid in the diagnosis of breast cancer. Dis Markers. 2001;17:301- 
7. 

11. Vlahou A, Schellhammer PF, Medrinos S, Patel K, Kondylis FI, Gong L, et al, 
5 Development of a novel proteomic approach for the detection of transitional cell 

carcinoma of the bladder in urine. Am J Pathol. 2001;158:1491-502. 

12. Patricoin EF m, Atdekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, 
etal. 

10 Use of proteomic patterns in serum to identify ovarian cancer. The Lancet. 
2002;359:572-577. 

13. Zhang Z, Page G, Zhang H. Applying Classification Separability Analysis to 
Microairay Data, in Proc. of Critical Assessment of Techniques for Microairay Data 

15 Analysis (CAMDA'OO), Kluwer Academic Publishers, 2001. 

14. Vapnik VN, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 
1998. 

20 15. Efron B and Tibshirani R. Bootstrap Mefliods for Standard Errors, Confidence 
Intervals, and Other Measures of Statistical Accuracy. Statistical Science. 1986;1:54- 
75. 

25 The invention has been described in detail witii reference to particular 

embodiments thereof. However, it will be appreciated tiiat those skilled in the art, 
upon consideration of tiiis ^closure, may make modifications witiim the spuit and 
scope of the invention. 



-63- 



wo 03/076896 




PCT/US03/06850 



What is claimed is: 

1 . A method of qualifying breast cancer in a subject comprising: 

(a) measuring at least one biomarker in a sample from a subject, and 

(b) correlating the measurement with breast cancer status, 
wherein the biomarker is selected from the group consisting of: 

Marker I (BCl) 

Marker n(BC2) 

Marker III (BC3) 

Marker IV 

Marker V 

Marker VI 

Marker Vn 

Marker Vm 

Marker DC 

Marker X 

Marker XI 

Marker Xn 

Marker Xm 

Marker XIV, 

and combinations thereof. 

2. The method of claim 1 further comprising: 

(c) managing subject treatment based on the status. 

3. The method of claim 2 wherein managing subject treatment is selected from 
ordering more rests, performing surgery, and taking no frirther action. 

4. The method of claim 2 frirfher comprising: 

(d) measuring the at least one biomarker after subject management. 
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5. The method of claim 1 \niierein the breast cancer status is selected from the 
group consistmg of the subject's risk of cancer, the presence or absence of disease, the stage 
of disease, and the effectiveness of treatment 

6. The method of claim 1 \(^erein measuring comprises detecting by mass 
spectrometry. 

7. The method of claim 1 wherein at least one biomarker is selected from 
Marker I (BCl), Marker U (BC2), and Marker HI (BC3). 

8. The method of claim 1 fiirther comprismg measuring a known breast cancer 
biomaiker in a sample from the subject and correlating measurement of the known 
biomarker and the measurement of any one or more of Markers I through XTV with breast 
cancer status. 

9. The method of claim 8 \)»dierein the known biomarker is selected from CA 
15.3 or CA 27.29. 

10. The method of claim 1 comprising measuring Marker I (BCl), Marker n 
(BC2), and Marker IH (BC3). 

1 1 . The method of claim 10 further comprising measuring a known breast 
cancer biomarker in a sample from the subject and correlating measurement of the known 
biomaricer and the measurement of any one or more of Markers I through XIV with breast 
cancer status. 

12. The method of claim 1 1 wherein the known biomarker is selected from CA 
15.3 or CA 27.29. 

13. The method of claim 1 wherein measuring comprises: 
(a) providing a subject sample of blood or a blood derivative; 
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(b) capturing one or more of Markers I through XIV from the sample on a surface of a 
substrate comprising capture reagents that bind the protein biomarkers. 

14. The method of claim 1 3 wherein the substrate is a SELDI probe comprising 
an IMAC Ni surface and wherein the protein biomarkers are detected by SELDI, 

15. The method of claim 13 wherein the substrate is a SELDI probe comprising 
biospecific affinity reagents that bind one or more of Markers I tibrough XIV and wherein 
the protein biomarkers are detected by SELDI. 

1 6. The method of claim 1 3 wherein the substrate is a microtiter plate 
comprising biospecific affinity reagents that bind one or more of Markers I through XV and 
the protein biomarkers are detected by immunoassay. 

17. The method of claim 1 wherein measuring is selected from detecting the 
presence or absence of the biomarkers(s), quantifying the amount of marker(s), and 
qualifying the type of biomarker. 

18. The method of claim 1 wherein at least one biomarker is measured using a 
biochip array. 

1 9. The method of claim 1 8 wherein the biochip array is a protein chip array. 

20. The method of claim 18 wherein the biochip array is a nucleic acid array. 

21 . The method of claim 1 8 wherein at least one biomarker is immobilized on 
the biochip array. 

22. The method of claim 1 wherein the protein biomarkers are measured by 
SELDI. 
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23. The method of claim 1 wherein tiie protem biomarkers are measured by 
immunoassay. 

24. The method of claim 1 wherein the correlating is performed by a software 
classification algorithm. 

25. The method of claim 1 wherein the sample is selected firom blood, serum 
and plasma. 

26. A method comprising: 

(a) measuring a pluraUty of biomarkers in a sample from the subject, wherein the pluraUty 
of biomarkers is selected from the group consisting of: 

Marker I (BCl) 

Marker n(BC2) 

Marker m(BC3) 

Marker IV 

Marker V 

Marker VI 

Marker Vn 

Marker Vm 

Marker DC 

Marker X 

Marker XI 

Marker Xn 

Marker XIII, and 

Marker XTV. 

27. The method of claim 26 wherein the plurality includes Marker I (BCl), 
Marker H (BC2), and Maricer HI (BC3). 

28. The method of claim 26 ftirther comprising measuring a known biomarker. 



-67- 



wo 03/076896 




PCT/US03/06850 



29. The method of claim 26 wherein the known biomaiicer is selected from CA 
15.3 or CA 27.29. 

30. Hie method of claim 26 wherein the protein biomarkers are detected by 
SELDI or immunoassay. 

3 1 . The method of claim 26 wherein the sample is selected ficom blood, serum 
and plasma. 

32. A method comprising: 

(a) measuring at least one biomarker in a sample from a subject, wherein the biomarker is 
selected from tiie group consistmg of: 

Marker I (BCl) 

Marker n(BC2) 

Marker in(BC3) 

Marker IV 

Marker V 

Marker VI 

Marker Vn 

Marker Vni 

Marker IX 

Marker X 

Marker XI 

Marker Xn 

Marker Xm, and 

Marker XIV, 

and combinations thereof. 

33. The method of claim 32 furttier comprising measuring a known biomarker. 
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34. The method of claim 33 wherein the known biomarker is selected ftom CA 
15.3 or CA 27.29. 

35. The method of claim 32 wherein the protein biomarkers are detected by 
SELDI or immmioassay. 

36. Hie method of claim 32 wherein the sample is selected from blood, serum 
and plasma. 

37. A kit comprising: 

(a) a capture reagent that binds a biomarker selected from the group consisting of: 
Marker I (BCl) 
Marker n(BC2) 
Marker in(BC3) 
Marker IV 
Marker V 
Marker VI 
Marker VH 
Marker Vm 
Marker IX 
Marker X 
Marker XI 
Marker Xn 
Marker Xm, and 
MailcerXIV, 

and combinations thereof; and 
(b) a container comprising at least one of the biomarkers. 

38. The kit of claim 37 v^erdn tiie capture reagent binds a plurality of the 
biomarkers. 
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39. The kit of claim 37 wherein the capture reagent is a SELDI probe. 

40. The kit of claim 37 further comprising a capture reagent that binds CA 1 5.3 
or CA 27.29. 

41 . The kit of claim 37 further comprising a second capture reagent that binds 
one of the biomarkers that the first capture reagent does not bind. 

42. A kit comprising: 

(a) a first capture reagent that binds at least one biomarker selected from the group 
consisting: 

Marker I (BCl) 
Marker II(BC2) 
Marker in(BC3) 
Marker rV 
Marker V 
Marker VI 
Marker Vn 
Marker VIII 
Marker DC 
Marker X 
Marker XI 
Marker XII 
Marker XIII, and 
Marker XrV; and 

(b) a second capture reagent that binds at least one of the biomarkers that is not boimd by 
the first capture reagent 

43. The kit of claim 42 wherein the at least one capture reagent is an antibody. 
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44. The kit of claim 42 furfher comprising an MS probe to which at least one 
capture reagent is attached or is attachable. 

45. The kit of claim 42 wherein the capture reagent is an immobilized metal 
chelate OQ^C). 

46. The kit of claim 42 further comprising a wash solution that selectively 
allows retention of the bound biomarker to the capture reagent as compared wifli other 
biomarkers after washing. 

47. A kit comprising: 

(a) a first capture reagent that binds at least one biomarker selected from the group 
consisting of: 

Marker I (BCl) 
Marker n(BC2) 
Marker m(BC3) 
Marker IV 
Marker V 
Marker VI 
Marker Vn 
Marker Vni 
Marker IX 
Marker X 
Marker XI 
Marker XII 
Marker Xm, and 
Marker XIV; 

(b) instructions for using the capture reagent to detect the biomarker. 

48. The kit of claim 47 wherein the capture reagent is an antibody. 
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49. The kit of claim 47 further comprising an MS probe to which the capture 
reagent is attached or is attachable. 

50. The kit of claim 47 wherein the capture reagent is an immobilized metal 
chelate (IMAC). 

5 1 . The kit of claim 47 further comprising a wash solution that selectively 
allows retention of the bound biomarker to the capture reagent as compared with other 
biomarkers after wgishing. 

52. The kit of claim 47 further comprising written instructions for use of the kit 
for detection of cancer. 

53. The kit of claim 52 wherein the instructions provide for contacting a test 
sample with the capture agent and detectmg one or more biomarkers retained by the capture 
agent 

54. An article manufacture comprising: 

(a) at least one capture reagent that binds to at least two biomarkers selected from the 
group consisting of: 

Marker I (BCl) 

Marker II (BC2) 

Marker ni(BC3) 

Marker IV 

Marker V 

Marker VI 

Marker Vn 

Marker Vm 

Marker IX 

Marker X 

Marker XI 
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Marker Xn 
Marker Xm, and 
Marker XIV. 

55. The article of manufacture of claim 54 further comprising a capture reagent 
that binds to a known biomarker. 

56. The article of manufacture of claim 55 v^erein the known biomarker is CA 
15.3 or CA 27.29. 

57. A system comprising: 

(a) a plurality of capture reagents each of which has bound to it a different biomarker 
selected from the group consisting of: 
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